CS544 Module 2 Assignment

General Rules for Homework Assignments

• You must work on your assignments individually. You are not allowed to copy the answers from the others.

• Each assignment has a strict deadline. If there is a delay, you must be in touch with the instructor. Late submissions without reasons will result in grade deduction.  

• When the term lastName is referenced in an assignment, please replace it with your last name.

 

Part1) Probability - 30 points

Show the solutions with the calculations without using R. Then, verify with R code.

 

a) A disease affects 6 out of 100 people on average. The sensitivity of a clinical test to detect the disease is 95%, which means 95% of people who have the disease get positive test results. Its false positive rate is 7%, which means 7% of people who do not have the disease get positive results in the tests.  What is the chance that a randomly selected person with a positive result does not have the disease? What is the chance that a randomly selected person with a negative result actually has the disease?

 

b) Suppose that in a particular state, among the registered voters, 40% are Democrats, 50% are Republicans, and the rest are independents. Suppose that a ballot question is whether to provide universal healthcare to citizens. Suppose that 92% of Democrats, 55% of Republicans, and 40% of Independents favor universal healthcare. If a person is chosen at random that favors universal healthcare, what is the probability that the person is  i) a Democrat? ii) a Republican, iii) an Independent. 

 

 

Part2) Random Variables -  35 points

a) Consider the experiment of rolling three dice. Using R, show how would you use a user-defined function to define a random variable that is the mean of the three rolls rounded to the nearest integer.

 

b) Using the above result, what is the probability that the random variable equals 3? What is the probability that the random variable takes a value of at most 3? What is the probability that the random variable takes on a value of at least 3? Use the Prob function as shown in the code samples.

 

c) Show the marginal distribution of the above random variable (using R).

 

d) Using R, add another random variable to the above probability space using a user-defined function. The random variable is TRUE if the first random variable is even and FALSE otherwise. What is the probability that the first random variable is even? Show the marginal distribution for the 2nd random variable.


Part3) R - 35 points

Using function data() to load R data set airquality.

 

Provide the simplest R code and output for all of the following. The code should work for any given data.

a) Use the diff function to calculate the temperature differences between consecutive days.

Insert the value 0 at the beginning of these differences. Add this result as the DIFFS column of the data frame.

b) Calculate the number of days that are warmer than the previous day?

c) Show the mean and median temperature for each month (May to Sept.). Do not hard code the month. Print out your result properly. For example, “The average and median temperature for May are xxx and xxx; …”.

d) Show the coldest and hottest day in each month (May to Sept.). Do not hard code the month. Print out your result properly. For example, “The coldest and hottest days in May are xxx and xxx; …”.

 

Submission:

Upload your result file to the Assignments section of Blackboard.

Provide all R code in a single file, CS544_lastName.R. Clearly mark each subpart of each question and add appropriate comments.

If you need to submit more than one files, create a folder, CS544_lastName and place all files in this folder. Archive the folder (CS544_A1_lastName.zip).