Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 231 Winter 2024

Assignment 3

Assignment 3 is due on Tuesday February 27 at 11:00am Eastern Time. Your assignment must be typed. You may create your document in Word, Google Docs, LaTeX or any other word processor. The requirement to type your assignment is to facilitate the grading so that the marked assignments can be returned to you in a timely fashion. It is also useful for you to gain some experience in creating a document containing mathematical expressions. Two documents have been posted in the Assignment 1 folder in LEARN on how to use the equation editor in Word. If you wish to use LaTeX then you may find Overleaf particularly useful for this. See https://www.overleaf.com/edu/uwaterloo

Upload your assignment to Crowdmark as a pdf file. You can upload your assignment as one document or individually for each problem. If you upload one document then you must drag and drop the pages for each problem to the appropriate question as indicated in Crowdmark. This is extremely important since dealing with assignments which are left as one document requires extra time and effort by the markers. Be sure to upload your assignment well in advance of the due time since uploading an assignment of many pages to Crowdmark requires time.

In addition to submitting your assignment component to Crowdmark, you must submit your assignment as a single pdf document to the Assignment 3 Dropbox in LEARN to facilitate the running of your assignment through plagiarism detection software. Your submissions to Crowdmark and the LEARN Dropbox must be identical. Please do not include these two pages of information or any instructions given for each problem in your assignment submission to Crowdmark and the LEARN Dropbox. Doing so means that your assignment is flagged by the Turnitin software used for checking plagiarism.

Many problems on this assignment indicate that your answers must be given in sentences. This course emphasizes learning to communicate statistical concepts in sentences.

In some of the problems on this assignment you are asked to use R.  Only the answers/results you obtain using R must be included in your Crowdmark pdf submission. Your R code must be uploaded as an R file to the Assignment 3 R Code Dropbox in LEARN. Effectively commenting your code is a important skill to develop. Markers will review your file and run it to verify the answers match those in your Crowdmark submission and that the code runs without error. Your code must correctly find the answers needed to get the marks associated with the problems. Good commenting will allow the marker to more easily assign you a full score when reviewing your file. Please ensure your code submitted in the R file is well commented.

Penalties:

(1) Answers which are not typed will not be marked and will receive a mark of zero.

(2) An assignment which is uploaded late to Crowdmark will be assigned a penalty of 5% per hour.

(3) An assignment which is left as a single document and not uploaded to the appropriate places in Crowdmark will be assigned a 10% overall penalty.

(4) An assignment which is submitted late to the Assignment 2 Dropbox in LEARN will be assigned a 5% overall penalty.

(5) If the file of R code is submitted late to the Assignment 2 R Code Dropbox in LEARN, then the assignment will be assigned a 5% overall penalty.

(6) Answers which are required to be written in sentences but are not in sentences will be assigned a 5% overall penalty.

(7) Assignments which include R code in the Crowdmark submission will be assigned a 5% overall penalty.

Checklist to complete for this assignment:

Upload the pdf of your assignment to Crowdmark by the deadline.

Upload the pdf file of your assignment to the Assignment 3 Dropbox in LEARN by the deadline.

Upload the R file of your R code to the Assignment 3 R Code Dropbox in LEARN by the deadline.

This assignment is based on the material in Chapters 1-3 and Sections 4.1-4.3 of the STAT 231 Course Notes.

Assignment 3 Learning Outcomes

Here are the intended learning outcomes for this assignment component. Try to identify the learning outcomes which are achieved by each of the given problems.

Enjoy ���

· Apply the steps of PPDAC to critically assess an empirical study.

· Identify and understand the factors which affect the properties of the sampling distribution of the sample mean.

· Identify and understand the factors which affect the properties of likelihood intervals.

· Use numerical and graphical summaries to assess the fit of a specified probability model for the data.

· Use likelihood intervals (interval estimates) to assess the uncertainty in the estimation of an unknown parameter.

Problem 1: PPDAC

The purpose of this problem is to construct a PPDAC based on the study described in the media article below.

See Section 3.2 of the Course Notes as well as the Chapter 3 Problems.

CTV News: Sleeping 5 hours or fewer every night could put you at risk of multiple chronic diseases: study by Alexandra Mae Jones

https://www.ctvnews.ca/health/sleeping-5-hours-or-fewer-every-night-could-put-you-at-risk-of-multiple-chronic-diseases-study-1.6121756

Date: Oct. 26, 2022

A new study using data that spans 25 years has found that getting just five hours of sleep or fewer every night is associated with a higher likelihood of being diagnosed with multiple chronic diseases. The study, which looked at the sleep duration of more than 7,000 participants at the ages of 50, 60 and 70, was published Tuesday in the peer-reviewed journal PLOS Medicine.

Those who reported regularly getting five hours of sleep or fewer at age 50 were 40 per cent more likely to have been diagnosed with two or more chronic diseases over the past 25 years, compared to people who slept around seven hours a night, the study found.

Severine Sabia of University College London’s Institute of Epidemiology & Health and the lead author of the study, said in a press release that “as people get older, their sleep habits and sleep structure change.” But getting seven to eight hours each night is still recommended, regardless of age.

Previous research has suggested that sleep durations above or below this recommended level may be associated with individual chronic diseases, Sabia noted.

Sabia and her team set out to investigate whether there was an association with less sleep and the risk of developing multiple chronic conditions, and researchers say that’s exactly what they found. “Our findings show that short sleep duration is also associated with multimorbidity,” Sabia said.

Multimorbidity simply means the co-occurrence of two or more chronic conditions. It’s something that becomes more likely as we age, but researchers expressed concerned as it appears to be on the rise in some regions. “Multimorbidity is on the rise in high-income countries, and more than half of older adults now have at least two chronic diseases,” Sabia said. “This is proving to be a major challenge for public health, as multimorbidity is associated with high health care service use, hospitalizations and disability.”

For this study, researchers looked at data from the Whitehall II cohort study, a database of more than 10,000 people who were employed in the London offices of the British Civil Service at the beginning of the data collection phase in 1985. Participants then reported for followups to track their health as they aged. They self-reported on their sleep duration around six times between 1985 and 2016. Researchers looked at this data and isolated sleep duration data given from participants when they were 50, 60 and 70 years of age, looking at around 7,000 participants in total. They then looked at whether these participants had any chronic conditions, and, if so, when they developed. Their definition of chronic diseases included diabetes, cancer, coronary heart disease, stroke, heart failure, chronic kidney disease, liver disease, depression, dementia, Parkinson’s disease, chronic obstructive pulmonary disease and arthritis.

Supporting previous research regarding the risk of individual chronic illnesses, sleeping for five hours or fewer at age 50 was associated with a 20 per cent risk of being diagnosed with a single chronic illness, compared to those getting the recommended hours. They found that those who reported regularly sleeping for five hours or fewer at the age of 50, 60 and 70 had a 30-40 per cent increased risk of multimorbidity compared to people who were sleeping for around seven hours a night. They also found that those who reported five hours of sleep at age 50 were 25 per cent more likely to have subsequently died at some point across the 25-year followup period — an association that may have to do with the increased risk of chronic diseases that could be responsible for mortality, researchers explained.

But does sleeping longer than advised have any associations with chronic illness? According to the research, it might when we’re getting up into our 60s and 70s, but perhaps not before. When researchers looked at whether sleeping for nine hours or more had any negative health outcomes, there was an association between the incidence of multimorbidity at age 60 and age 70. However, they found no clear association between extended sleep durations at age 50 in healthy people and multimorbidity. If participants already had one chronic illness at age 50, long sleepers did have a 35 per cent increased risk of developing another illness, perhaps due to underlying health conditions, researchers suggest.

Jo Whitmore, a senior cardiac nurse at the British Heart Foundation who was not involved in the research, said in the release that “Getting enough sleep allows your body to rest. “There are a host of other ways that poor sleep could increase the risk of heart disease or stroke, including by increasing inflammation and increasing blood pressure,” she added. “This research adds to a growing body of research that highlights the importance of getting a good night’s sleep.”

Sabia said that getting a good night’s sleep requires “good sleep hygiene, such as making sure the bedroom is quiet, dark and a comfortable temperature before sleeping. “It’s also advised to remove electronic devices and avoid large meals before bedtime. Physical activity and exposure to light during the day might also promote good sleep.”

Your answers to the questions on the next page must be written in complete sentences, not bullet form.

Please include only your solutions (no assignment questions or instructions) in your submissions to Crowdmark and the LEARN Dropbox.

To answer these questions you may also wish to consult the corresponding research article “Association of sleep duration and risk of multimorbidity in the UK” posted on LEARN in the Assignment 3 folder.

(a) The empirical study described in the media article and the research article can best be described as a sample survey, an observational study or an experimental study? Justify your answer.

(b) Based on the media article,  what is the Problem for this study? (Note: the Problem should be defined concisely in one or two sentences.)

(c) What type of Problem (descriptive, causative, predictive) is this? Be sure to justify your answer.

(d) The researchers in this study did not clearly define their target population/process. Use the information provided in the media article and research articles to suggest a reasonable target population/process for this study. You should clearly identify what a unit is in the target population/process. Be sure to justify your choice of target population/process.

(e) Use the information provided in the research article to define the study population/process for this study. Be sure to justify your choice of study population/process.

(f) Many variates were observed for the participants in this study. Use the information provided in the research article to answer the questions below. (Note: To determine the type of variate it is useful to think about the possible values for the variate. Variates can be discrete, continuous, categorical, ordinal or complex.)

(1) One of the two most important variates collected in this study was sleep duration. How many hours of sleep a person has on an average week-night is a continuous variate. Such a variate would be difficult to measure. Explain clearly how the variate sleep duration was collected in this study. What is its type?

(2) When a new variate is created from variates which are measured for each unit in a study the new variate is called a derived variate. For example, BMI (body mass index) is determined by a person’s height and weight and therefore BMI would be called a derived variate.

The other most  important variate collected in this study is multimorbidity. Multimorbidity is a derived variate. Explain clearly how the derived variate multimorbidity was obtained in this study. What is its type?

Other variates were also collected in this study. Include the following completed sentences in your Crowdmark submission:

(3) “Mortality” is a _______________  variate.

(4) “Ethnicity” is a _______________  variate.

(5) “Occupational position” is a _______________  variate.

(6) “Time spent in moderate and vigorous physical activity” is a _______________  variate.

(g) For each of the variates in (f), give an example of a corresponding attribute of the study population/process which could be estimated/determined in this study. (Note: Please check the examples of attributes given on pages 115-116 of the Course Notes.)

(h) Give the definition of study error. Give one example of study error in relation to your choice for the target population/process (part (d)), your choice for the study population/process (part (e)), and one of the attributes in (g). Your answer should clearly explain why you believe this is a source of study error.

(i) Based on the information provided, describe the sampling protocol for this study in as much detail as possible. Be sure to clearly indicate what the sample is and give the sample size.

(j) Give the definition of sample error. Give one example of sample error in relation to your choice for the study population/process (part (e)),  the sample (part (i)), and one of the attributes in (g). Your answer should clearly explain why you believe this is a source of sample error.

(k) Give the definition of measurement error. Give one possible example of measurement error for one of the variates in (f).

(l) In one or two sentences give the main conclusion of the study made by the author of the media article. Is this conclusion the same as in the corresponding research article? Explain why or why not.

(m) Give what you think is the most important limitation to this study. Briefly explain why you think this limitation is so important.

Problem 2:  Sampling distribution of the sample mean

The purpose of this problem is to investigate the factors which affect the sampling distribution of the sample mean.  Please see Section 4.2 of the Course Notes.

To do this investigation you will use the R shiny app:

https://shiny.math.uwaterloo.ca/sas/stat231/samplingdistributions/

The inputs for the shiny app are: distribution, sample size, parameter(s) of the distribution, and number of samples. You are asked to conduct an investigation of the factors which affect the sampling distribution of the sample mean by varying these inputs.

Note: the Shiny app limits you to values for n and N such that n*N does not exceed 50,000. R code is provided R file called Assignment3Problem2RCode posted in the Assignment 3 folder. This R code will generate the same outputs as the Shiny app which you can run on your own computer. You should try out the code if you are having trouble seeing patterns in the Shiny app, as it will allow you to explore situations with much larger values of n and N.

All written answers must be in full sentences.

Please include only your solutions (no assignment questions or instructions) in your submissions to Crowdmark and the LEARN Dropbox.

Note: When describing the shape of a distribution it is helpful to answer the following questions:

Is the distribution unimodal? Is the shape symmetric, skewed to the left or skewed to the right? If the distribution is symmetric is it bell-shaped, u-shaped or uniform? If the distribution is symmetric, does the distribution have fat or thin tails as compared to the Gaussian distribution?

In this problem you will investigate the properties of the sampling distribution of the sample mean when random samples are drawn from the Exponential distribution. Each part of the investigation will begin with the app set at the following initial setting.

Initial setting: On the app select the distribution Exponential, input 25 as the sample size, 2 as the value of , and 500 as the number of samples.

Important Note: You will sometimes need to change the number of bins to adequately display the sampling distribution.

(a) The plot displayed on the left of the shiny app is the population distribution from which the random sample is drawn.

(i) What is the mean of this population for the initial setting?

(ii) What is the standard deviation of this population for the initial setting?

(iii) Describe the shape of this population for the initial setting.

(b) The plot displayed on the right of the shiny app is an approximation of the sampling distribution of the sample mean based on drawing N samples from the population you described in (a). By clicking on the Resample! button you can observe what happens for different samples.

(i) What is = the expected value of the sample mean for the initial setting?

(ii) What is = the standard deviation of the sample mean for the initial setting?

(iii) Describe the shape of the sampling distribution of the sample mean for the initial setting.

(iv) Is the random variable a discrete or continuous random variable? Explain your answer. (Hint: Think about the possible values that the random variable can assume.)

In parts (c)-(e) of this problem you will investigate factors affecting the sampling distribution of the sample mean when samples are drawn from an Exponentialpopulation by beginning with the initial setting and then varying one of the inputs of the shiny app. For each part you are asked to answer the following 3 questions based on what you observe.

Question 1: As you vary this input does the location of the sampling distribution of the sample mean change? Explain why changes occur or don’t occur.

Question 2: As you vary this input does the spread of the sampling distribution of the sample mean change? Explain why changes occur or don’t occur.

Question 3: As you vary this input does the shape of the sampling distribution of the sample mean change? Explain why changes occur or don’t occur.

(c) Set the values of the inputs as given in the Initial setting. Vary the value of from 2 to 30 holding all other inputs fixed.  Answer Questions 1-3. (Be sure to hit the Resample! many times to see what changes for different samples.)

(d) Set the values of the inputs as given in the Initial setting. Vary the value of the sample size from 25 to 100 holding all other inputs fixed.  Answer Questions 1-3. (Be sure to hit the Resample! many times to see what changes for different samples.)

(e) Set the values of the inputs as given in the Initial setting. Vary the value of the number of samples from 500 to 1000 holding all other inputs fixed. Answer Questions 1-3. (Be sure to hit the Resample! many times to see what changes for different samples.)

(f) On the app input 25 as the sample size, 2 as the value of , 500 as the number of samples, and 0.25 as the margin of error. What percentage of the sample estimates of the population mean are within 0.2 of the true value? Change the sample size to 50. What percentage of the sample estimates of the population mean are now within 0.2 of the true value? Explain in non mathematical terms why this makes sense.

Problem 3:  Sampling distribution of the likelihood ratio statistic

The purpose of this problem is to investigate the factors which affect the sampling distribution of the likelihood ratio statistic.  The likelihood ratio statistic is a random variable defined as

The likelihood ratio statistic is a function of the random variables . The sampling distribution of the likelihood ratio statistic is approximately Chi-squared(1) for large n. Please see Section 4.6 of the Course Notes.

To do this investigate the sampling distribution of the likelihood ratio statistic for different distributions you will use the R shiny app:

https://shiny.math.uwaterloo.ca/sas/stat231/LRstatistic/

The inputs for the shiny app are: distribution, sample size, parameter(s) of the distribution, and number of samples. You are asked to conduct an investigation of the factors which affect how well the Chi-squared(1) distribution approximates the sampling distribution of the likelihood ratio statistic by varying these inputs.

All written answers must be in full sentences.

Please include only your solutions (no assignment questions or instructions) in your submissions to Crowdmark and the LEARN Dropbox.

Important Note: You will sometimes need to change the number of bins to adequately display the sampling distribution.

(a) Sampling distribution of the likelihood ratio statistic for random samples from the Poisson distribution

For a random sample of size n from a Poissondistribution the likelihood ratio statistic is the random variable

where

is the sample mean.

(i) Is the random variable a discrete or continuous random variable? Explain your answer. (Hint: Think about the possible values that the random variable can assume.)

(ii) Is the likelihood ratio statistic for the Poisson distribution a discrete or continuous random variable? Explain your answer. (Hint: Think about the possible values that the random variable can assume.)

Initial setting: On the app select the distribution Poisson, input 30 as the sample size, 2 as the value of , and 500 as the number of samples. Click on the “Add chi-squared distribution curve?”

The plot displayed on the left of the shiny app is the population distribution from which the random sample is drawn. The plot displayed on the right of the shiny app is an approximation of the sampling distribution of the likelihood ratio statistic based on drawing N samples of size n from the Poissondistribution. (The exact distribution of the random variable would be difficult to obtain.) By clicking on the Resample! button you can observe what happens for different samples.


In parts (iii)-(v) of this problem you will investigate factors affecting the sampling distribution of the likelihood ratio statistic when samples are drawn from a Poissonpopulation by beginning with the initial setting and then varying one of the inputs of the shiny app.

(iii) Set the values of the inputs as given in the Initial setting. Vary the value of from 1 to 30 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)


(iv) Set the values of the inputs as given in the Initial setting. Vary the value of the sample size n from 10 to 100 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)

(v) Set the values of the inputs as given in the Initial setting. Vary the value of the number of samples N from 700 to 1000 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)

(b) Sampling distribution of the likelihood ratio statistic for random samples from the Exponential distribution

For a random sample of size n from a Exponentialdistribution the likelihood ratio statistic is the random variable

where

is the sample mean.

(i) Is the likelihood ratio statistic for the Exponential distribution a discrete or continuous random variable? Explain your answer. (Hint: Think about the possible values that the random variable can assume.)

Initial setting: On the app select the distribution Exponential, input 25 as the sample size, 3 as the value of , and 500 as the number of samples. Click on the “Add chi-squared distribution curve?”

The plot displayed on the left of the shiny app is the population distribution from which the random sample is drawn. The plot displayed on the right of the shiny app is an approximation of the sampling distribution of the likelihood ratio statistic based on drawing N samples of size n from the Exponentialdistribution. (The exact distribution of the random variable would be difficult to obtain.) By clicking on the Resample! button you can observe what happens for different samples.

In parts (ii)-(iv) of this problem you will investigate factors affecting the sampling distribution of the sample mean when samples are drawn from an Exponentialpopulation by beginning with the initial setting and then varying one of the inputs of the shiny app.

(ii) Set the values of the inputs as given in the Initial setting. Vary the value of from 1 to 30 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)

(iii) Set the values of the inputs as given in the Initial setting. Vary the value of the sample size n from 10 to 100 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)

(iv) Set the values of the inputs as given in the Initial setting. Vary the value of the number of samples N from 700 to 1000 holding all other inputs fixed.  As you vary this input does how well the Chi-squared(1) distribution fits the histogram change? Explain why changes occur or don’t occur. (Be sure to hit the Resample! many times to see what changes for different samples.)