闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Summer Examination Period 2021 — May — Semester B

ECS7005P Risk and Decision-Making for Data Science and AI

Question 1

A new virus is affecting the population. People who have the virus will normally have specific symptoms such as a cough and the loss of the sense of taste and/or smell.

It is estimated that 1 in 5 of people who suffer these symptoms have the virus and 1 in 2000 people without these symptoms have the virus.

A test for the virus has the following accuracy

· For people with symptoms, the true positive rate is 90% and the false positive rate is 5%

· For people without symptoms, the true positive rate is 80% and the false positive rate is 1%

Answer the following questions:

a) If we know that 5% of the population have symptoms, what percentage of the population has the virus? [2 marks]

b) What is the probability that a person with symptoms will test positive? [2 marks]

c) What is the probability that a person without symptoms will test positive? [2 marks]

d) A person with symptoms tests positive. What is the probability they have the virus? [2 marks]

e) A person with symptoms tests negative. What is the probability they have the virus? [2 marks]

f) A person without symptoms tests positive. What is the probability they have the virus? [2 marks]

g) A person without symptoms tests positive and is subject to an additional test. Assuming that a second test is independent of the first, what is the probability they test positive in this second test? [4 marks]

h) A person without symptoms tests positive in both the first and second test. What is the probability they have the virus? [4 marks]

[Question 1 Total: 20 marks]

Question 2

Table 1 summarizes the results from an observational study into the effectiveness of two drugs A and B for treating migraine

	Patients aged < 50		Patients aged 50+
	Effective	Non-effective	Effective	Non-effective
Drug A	420	80	70	30
Drug B	85	15	150	50

The ‘success rate’ is the percentage of effective outcomes.

Answer the following questions:

a) What was the ‘success rate’ for Drug A for the study participants overall? [1 mark]

b) What was the ‘success rate’ for Drug B for the study participants overall? [1 mark]

c) What was the ‘success rate’ for Drug A for the study participants aged < 50? [1 mark]

d) What was the ‘success rate’ for Drug B for the study participants aged < 50? [1 mark]

e) What was the ‘success rate’ for Drug A for the study participants aged 50+? [1 mark]

f) What was the ‘success rate’ for Drug B for the study participants aged 50+? [1 mark]

g) What can you conclude from the above results? [2 marks]

h) Name the paradox evident in this study. [1 mark]

i) What is the main cause of the paradox in this example? [3 marks]

j) Draw the causal model that explains the data and write down the probability tables for each node in that model. [6 marks]

k) How would you amend the model to one that avoids the paradox? [2 marks]

l) By doing what you proposed in k) (or by other means) estimate the ‘true’ success rate for each drug for the whole population. [4 marks]

m) Suppose you know that a patient took Drug A and the outcome was not effective. We don’t know the patient’s age, but we want to answer the counterfactual question; “Would the outcome have been effective if this patient had taken Drug B instead of Drug A?”. In your answer to this question provide a sketch of a causal model that supports your reasoning. [6 marks]

[Question 2 Total: 30 marks]

Question 3

It is known that about 2.3% of people who have sleeping disorders have severe insomnia (defined as going more than 36 hours without being able to sleep at all)

A study of 1000 people who have sleeping disorders discovered that tea-drinkers (classified as those who drink more than 2 cups of tea a day) are more likely to suffer severe insomnia.

	Tea-drinkers	Not tea-drinkers
Severe insomnia	9	14
Other sleeping disorders	291	686
Total	300	700

a) Answer the following about people with sleeping disorders:

i) What is the relative increase in risk of having severe insomnia for tea drinkers compared to non-tea drinkers? [3 marks]

ii) What is the absolute increase in risk of having severe insomnia for tea drinkers compared to those who are not tea-drinkers? [3 marks]

b) Suppose we know that 10% of the population have sleep disorders. Of those with sleeping disorders, 30% are tea—drinkers. Of those with no sleeping disorders only 20% are tea drinkers. Answer the following questions about the whole population:

i) What is the relative increase in risk of having severe insomnia for tea-drinkers compared to those who are not tea-drinkers? [5 marks]

ii) What is the absolute increase in risk of having severe insomnia for tea drinkers compared to those who are not tea-drinkers? [5 marks]

Hint: you should assume a population size of 100,000 and create two tables like above for people with and without sleep disorders.

c) What paradox could be triggered if you used the above 1000-person study to make inferences about the risk of severe insomnia caused tea-drinking to the entire population? [2 marks]

d) Which of the following headlines is the most misleading? [2 marks]

i) “Study shows people with sleeping disorders should consider cutting down on the amount of tea they drink”.

ii) “Drinking more than 2 cups of tea a day more than doubles the risk of having the most severe form of sleep disorder”.

iii) “People with sleeping disorders who drink more than 2 cups of tea a day are at increased risk of the most severe sleep deprivation”.

iv) “Drinking more than 2 cups of tea a day may lead to severe sleep deprivation”.

[Question 3 Total: 20 marks]

Question 4

The following algorithm is ‘learnt’ from a subset of the dataset of passengers on the Titanic cruise liner which sank after hitting an iceberg on 15 April 1912:

If Sex = “Male” then Probability (survive) = 0.2

If Sex = “Female” and Class = 1 or 2 then Probability (survive) = 0.8

If Sex = “Female” and Class = 3 then Probability (survive) = 0.6

The relevant information in the different test dataset is summarized as:

	Male	Female Class 1 or 2	Female Class 3
Survived	75	75	60
Did not survive	225	15	50

Based on this test set data, the accuracy of the algorithm for cut-off value 0.1 can be represented in the following format, where “YES” means survive and “NO” means not survive.

	Number predicted YES	Number predicted NO	Total
Number YES’s	210	0	210
Number NO’s	290	0	290

This enables us to compute:

Sensitivity:100%; Specificity: 0%; False positive rate: 100%; Accuracy:42%

a) For each of the different cut-off values 0.5, 0.7, 0.9 complete the following table and fill in all the missing ?? values

	Number predicted YES	Number predicted NO	Total
Number YES’s	??	??	210
Number NO’s	??	??	290

Sensitivity: ??%; Specificity: ??%; False positive rate: ??%; Accuracy: ??%

You will need to complete three tables and in each case the sensitivity specificity, false positive and accuracy percentages (8 marks each). [24 marks]

b) Sketch the ROC curve for this algorithm. [6 marks]

[Question 4 Total: 30 marks]

Solutions

Question 1

a) If we know that 5% of the population have symptoms, what percentage of the population has the virus? (0.05 x 0.2)+(0.95 x 0.0005) = 0.010475 = 1.0475% [2 marks]

b) What is the probability a person with symptoms will test positive? 22% [2 marks]

c) What is the probability a person without symptoms will test positive? 1.04% [2 marks]

d) A person with symptoms tests positive. What is the probability they have the virus? 81.8% [2 marks]

e) A person with symptoms tests negative. What is the probability they have the virus? 2.6% [2 marks]

f) A person without symptoms tests positive. What is the probability they have the virus? 3.8% [2 marks]

g) A person without symptoms tests positive. Assuming that a second test is independent of the first, what is the probability they test positive in a second test? 4.04% [4 marks]

h) A person without symptoms tests positive in both the first and second test. What is the probability they have the virus? 76.2% [4 marks]

Question 2

a) Drug A overall? 81.7% [1 mark]

b) Drug B overall? 78.3% [1 mark]

c) Drug A for the study participants aged < 50? 84% [1 mark]

d) Drug B for the study participants aged < 50? 85% [1 mark]

e) Drug A for the study participants aged 50+? 70% [1 mark]

f) Drug B for the study participants aged 50+? 75% [1 mark]

g) in each age subcategory Drug B was more effective than drug A, but overall Drug A was more effective [2 marks]

h) Simpson’s paradox [1 mark]

i) Age is a confounder. There were fewer older people in the study and older people were more likely to take Drug B than Drug A [3 marks]

j) The model [6 marks]

k) Cut the link into node “Drug” [2 marks]

l) A: 79.3% B: 81.7% [4 marks]

m) [6 marks]

Question 3 (TOTAL 20 marks)

a) People in the study

i) Tea drinkers 3% non-tea drinkers 2%, so 50% relative risk increase. [3 marks]

ii) Absolute risk increase is 1% [3 marks]

b) Whole population

	Sleep disorders (10,000)		No Sleep disorders (90,000)
	Tea drinkers (3,000)	Non-tea drinkers (7,000)	Tea drinkers (18,000)	Non-tea drinkers (72,000)
Most Severe	90	140	0	0
Not most severe	2100	6,860	18,000	72,000

	Sleep disorders (100,000)
	Tea drinkers (21,000)	Non-tea drinkers (79,000)
Most Severe	90	140
Not most severe	20910	78,860

i) 90 out of 21,000 tea drinkers (=0.4286%) have the most severe form of sleep deprivation; 140 out of 79,000 non-tea drinkers (=0.1772%) have the most severe form of sleep deprivation So relative risk increase is (0.4286-0.1772)/0.1772= 142% [5 marks]

ii) But absolute risk increase is just 0.25% [5 marks]

c) Berkson’s or Collider paradox [2 marks]

d) (ii) is the most misleading? [2 marks]

Question 4

a) The accuracy for cut-off value 0.5 is: