闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 2 Assessment, 2021

MAST20005 Statistics

Question 1 (10 marks)

You have random samples from three groups, x ~ N(u1, 口1(2)), y ~ N(u2, 口2(2)) and Z ~ N(u3, 口3(2)). These have sample sizes of n1 , n2 and n3 , respectively. Some R output from analyzing these data are below. The variable x contains the observations of X, the variable y contains the observations of Y and the variable z contains the observations of Z.

> summary(x) Min . 1st Qu .

2.527 3.654

> summary(y) Min . 1st Qu .

2.096 3.339

> summary(z) Min . 1st Qu .

2.392 3.310

Median

4.172

Median

3.761

Median

4.267

Mean 4.136

Mean 3.917

Mean 4.185

3rd Qu .

4.653

3rd Qu .

4.497

3rd Qu .

4.647

Max .

5.530

Max .

6.186

Max .

7.075

> c(length(x), length(y))

[1] 40 20

> c(sum(x), sum(y), sum(z))

[1] 165 .44587 78 .34268 41 .84626

> c(sd(x), sd(y), sd(z))

[1] 0 .7182277 1 .0530746 0 .7182277

(a) For each of the following quantities, state or calculate its value if possible, or otherwise

explain why it is not possible.

(i) y(20.5)

(ii) n2

(iii) 口3

(iv) ((20)

(b) For each of the following statements, state whether they are true, false or if it is not pos- sible to know from the given information. In each case state the values of the quantities, if possible.

(i) 北i = 40 ×

(ii) 口1 < 口2

(c) For each of the following, calculate the speciﬁed interval estimate if possible, or otherwise explain what further information you need in order to do the calculation.

(i) Calculate a 95% conﬁdence interval for u1 - u2 . You may assume 口1 = 口2 .

(ii) Calculate a 95% conﬁdence interval for 口3 .

Question 2 (9 marks)

The University of Texas Southwestern Medical Centre ran a survey to investigate if there is a relationship between the place of birth and the risk of newborns getting infected with menin- gitis at birth. Three birth scenarios were examined: hospital delivery, home delivery without emergency assistance (HD without EA) and home delivery with emergency assistance (HD with EA). The centre surveyed 1053 newborns. The collected data are summarised in the following R output (with some of the details removed):

> X

HD with EA HD without EA Hospital delivery

No Meningitis 62 98 813

Meningitis 16 28 36

> chisq .test(X)

Pearson’s Chi-squared test

data: X

X-squared = 70 .553, df = (??), p-value = 4 .783e-16

(a) What is the “Pearson’s Chi-squared test” testing in this situation?

What is your conclusion based on the R output, using a signiﬁcance level of 0.05?

(b) Assuming the null hypothesis of the test given above, what is the expected number of newborns with meningitis delivered at home without emergency assistance?

(d) Help a young researcher from the University of Texas Southwestern Medical Centre to perform a test to examine the claim:

Meningitis infection is less common among newborns that are delivered at home with emergency assistance than among newborns delivered at home without emergency assis- tance .

Use a signiﬁcance level of 0.01. Clearly state your hypotheses, the deﬁnition and value of your test statistic, critical value(s) and conclusions.

Question 3 (12 marks)

Daniala wants to compare customer spending at diﬀerent shops. For this purpose, she runs a one-way analysis of variance on a dataset containing information about customer spending (in dollars) at diﬀerent shops in her suburb. A partial ANOVA table and some summary of the data are given below.

Source

SS MS F

Shop

Error

117,864

Total

423

159,175

Shop 1		Shop 2	Shop 3	Shop 4	Shop 5
Sample size	86	216	83	13	26
Sample mean	37.66	45.93	64.83	66.82	37.10
Sample standard deviation	16.56	18.67	14.10	2.496	11.32

Complete the ANOVA table. Show your working.

(b) Is there evidence of any diﬀerences in the average spending between the ﬁve shops?

Consider a 10% signiﬁcance level and state the relevant critical value.

Assuming a normal distribution for the customer spending, test whether the variance of customer spending in shop 1 is smaller than that in shop 2. Use a signiﬁcance level of 0.05.

(d) Ben found Daniala’s data very useful for his research on pensioners in the suburb. He took from this data a subset that related to spending by pensioners only. Coincidentally, each shop was visited by exactly 8 pensioners.

Let the population mean of the pensioners’ spending in shop i be ui, let the corresponding population variances be 口 (i.e. the same for each shop), and let u = i(5)=1 ui . Let the sample mean for shop i be i. , and let .. = i(5)=1 i. be the grand mean. Answer the following questions, showing your working.

(i) Express E(.. ) in terms of u.

(ii) Express var(.. ) in terms of 口 .

(iii) Express E(2.) in terms of u and 口 .

Question 4 (8 marks)

Let x be the lifetime of a mobile phone battery after a full recharge. The distribution of x can be modelled by a shifted exponential distribution, having pdf f (北) = 入e_入(z_5) , for 北 > 5.

(a) Find the mean of x .

Find the median of x .

(c) Julia measured the lifetime of the battery of her mobile phone after a full recharge 20 times in a row (assume this consitutes a random sample). Across those 20 attempts, the mean lifetime was 17.3 hours, with a standard deviation of 1.1 hours. Calculate an approximate 95% conﬁdence interval for the median lifetime of the battery.

Question 5 (10 marks)

Let x1 , . . . , xn be a random sample from the Poisson distribution with mean 入. This has pmf

入北 e_入

where y e {0, 1, . . . } and 入 e (0, o).

(a) Determine a suﬃcient statistic for 入.

Find the maximum likelihood estimator (MLE) of 入.

(d) Find the Cram´er–Rao lower bound for unbiased estimators of 入.

Question 6 (10 marks)

The time required for Martina to answer a student’s question during her tutorials has an expo- nential distribution with mean 1/入. She measured the time required for a random sample of 30 students’ questions this semester and found that her average time was 2.5 minutes. As a prior for 入 use a gamma distribution with mean 0.2 and standard deviation 0.2.

(a) Write down the likelihood function for 入.

Derive the posterior distribution for 入.

Is the prior distribution that is used here conjugate? Justify your answer.

(d) Find the posterior mean and posterior standard deviation of 入.

Question 7 (10 marks)

P. Cortez and A. Silva examined students’ achievement in secondary education in two Portuguese schools during 2006 (total sample size, 382). In the following R code, the authors tried to examine the relationship between the students’ ﬁnal grade (G3) and their ﬁrst grade (G1). Each grade is an integer between 0 and 20, inclusive.

> reg <- lm(formula = G3 ~ G1 , data = grades)

> summary(reg)

Call:

lm(formula = G3 ~ G1, data = grades)

Residuals:

Min 1Q Median 3Q Max

-11 .6706 -0 .7974 0 .3294 1 .7098 5 .0902

Coefficients:

Estimate Std . Error t value Pr(>|t |)

(Intercept) -1 .85100 0 .48392 -3 .825 0 .000153 ***

G1 1 .12680 0 .04258 26 .462 < 2e-16 ***

---

Signif . codes: 0 *** 0 .001 ** 0 .01 * 0 .05 . 0 .1 1

Residual standard error: 2 .784 on 380 degrees of freedom Multiple R-squared: 0 .6482, Adjusted R-squared: 0 .6473 F-statistic: 700 .3 on 1 and 380 DF, p-value: < 2 .2e-16

(a) Give an estimate of the slope (8) coeﬃcient.

(b) Test the null hypothesis that 8 = 1 against a two-sided alternative, using a signiﬁcance level of 0.02.

(d) Which of the following R commands would you use to provide an estimated range for Liqi’s ﬁnal mark? Justify your answer.

> predict(reg, newdata = data .frame(G1 = 13), interval = "prediction") > predict(reg, newdata = data .frame(G1 = 13), interval = "confidence")

(e) Calculate correlation coeﬃcient between G1 and G3.

(f) Consider the ANOVA interpretation of regression. For the linear model yi = a + 8yi , the Ms(R) can be written as 8ˆ2 (yi - )2 .

Show that E[Ms(R)] = 口2 + 82 (yi - )2 .

Question 8 (11 marks)

In the ﬁrst week of semester we did a ‘fun quiz’ to predict how long the border between NSW and Victoria will be closed. The responses to this were saved in a data frame that contains two columns: subject (either ‘MAST20005’ or ‘MAST90058’ for each student, depending on which subject they were enrolled in) and answer (the number of days predicted by each student). The data was analysed by Julia using R. Some output from this is shown below.

> table(quiz$subject)

MAST20005 MAST90058

99 17

> mast20005 <- quiz$answer[quiz$subject == "MAST20005"]

> mast90058 <- quiz$answer[quiz$subject == "MAST90058"]

> summary(mast20005)

Min . 1st Qu . Median Mean 3rd Qu . Max .

10.00 39.00 60.00 67.79 85.00 243.00

> summary(mast90058)

Min . 1st Qu . Median Mean 3rd Qu . Max .

20.00 43.00 63.00 63.29 75.00 145.00

> sd(mast20005)

[1] 42 .01998

> sd(mast90058)

[1] 31 .30049

(a) Calculate a 95% conﬁdence interval for the ratio of variances of the two groups, 口M(2)AST20005 /口M(2)AST90058 . Assume the data are a random sample from each group of students. Is it plausible that the two groups have similar variance?

(b) Assume that a gamma distribution, f (y) = 8aya_1 e_βz /Γ(a), is a good approximation to the data in the answer column for the MAST20005 group. Show that the sample mean for this group, , is equal to the ratio of the maximum likelihood estimators, i.e. = /8ˆ .

Hint : it is not possible to derive separate formulae for and 8ˆ .

(d) Calculate the MM estimates of a and 8 for the MAST20005 group using Julia’s summary statistics of the data given at the beginning of the question.

2022-11-05

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言