闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MAST20005 Statistics

2018

Question 1 (10 marks) You have random samples from two groups, X ~ N(µ1 , σ 1(2)) and

Y ~ N(µ2 , σ2(2)). Some R output from analysing these data are below. The variable x contains the observations of X and the variable y contains the observations of Y .

> summary(x)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.620 3.320 4.040 4.193 4.515 6.300

> summary(y)

Min. 1st Qu. Median Mean 3rd Qu. Max.

4.150 4.450 5.355 5.212 5.595 6.720

> sd(x)

[1] 1.200121

> sd(y)

[1] 0.8368034

> sort(y)

[1] 4.15 4.25 4.34 4.78 5.30 5.41 5.46 5.64 6.07 6.72

(a) For each of the following quantities, state or calculate its value if possible, or otherwise explain why it is not possible.

(i) x(1)

(ii) y(4.5)

(iii)

(iv) σ2

(b) For each of the following statements, state whether they are true, false or if it is not possible to know from the given information. In each case state the values of the quantities, if possible.

(i) x(1) > y(1)

(ii) > y¯

(iii) σ 1 > σ2

(c) For each of the following pairs of hypotheses, carry out the test if it is possible, using a 5% signiﬁcance level, or otherwise explain what further information you need in order to do it.

(i) H0 : µ 1 = µ2 versus H1 : µ 1 µ2 (ii) H0 : σ2 = 2 versus H1 : σ2 2

Question 2 (9 marks) A random sample on X produced the following observations:

5.5 5.8 6.0 6.6 6.8 6.9 7.1 7.3 7.5 8.7

For these data, we have = 6.82 and s = 0.932.

(a) Let µ = E(X). Calculate a 95% conﬁdence interval for µ, assuming that X is normally distributed.

(b) Let p = Pr(X > 6.5). Calculate a 95% conﬁdence interval for p.

(c) Let m be the median of X. Calculate a distribution-free conﬁdence interval for m, with an approximate conﬁdence level of 90%.

Question 3 (11 marks) Consider a random sample of size n on X which has a geometric distribution with parameter θ. Its pmf is:

pX (x) = θ(1 - θ)z, x e {0, 1, 2, . . . }

and it has mean (1 - θ)/θ .

(a) Determine a suﬃcient statistic for θ .

(b) Find the method of moments estimator of θ .

(d) Find the Cram´er–Rao lower bound for unbiased estimators of θ . (e) Derive an expression for the standard error of the MLE.

(f) A random sample of size n = 20 produced the following observations:

3 2 0 1 1 3 0 0 0 0 0 3 0 3 1 1 0 2 2 0

Estimate θ and calculate an approximate 90% conﬁdence interval.

Question 4 (12 marks) Laleh has bought a new toaster. It has a dial that allows her to set the ‘strength’ of toasting. She is not sure if it works very well and decides to run some experiments. She sets the dial to various values, x, and measures how long the toaster cooks the bread, Y , before it pops the bread out. She does a simple linear regression analysis of these data using the model E(Y I x) = α + βx. Some partial R output from her analysis is shown below.

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.7323 1.8852

x 1.1556 0.3818

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 1.686 on 28 degrees of freedom Multiple R-squared: 0.2465, Adjusted R-squared: 0.2196 F-statistic: 9.159 on 1 and 28 DF, p-value: 0.005262

(a) How many experiments did Laleh carry out?

(b) Carry out the following hypothesis tests, using a 5% signiﬁcance level.

(i) H0 : α = 0 versus H1 : α 0 (ii) H0 : β = 0 versus H1 : β 0

(d) Write out the ANOVA table for this regression model ﬁt. (Hint: the F statistic is shown in the above R output.)

Question 5 (13 marks) On his way to work in the mornings, Damjan records how long he has to wait for his train at the station. Over a series of days, he observes the following times (in minutes):

4.8 1.2 3.7 0.9 0.7 0.3 0.9 3.2 1.4

For these data we have = 1.9 and s = 1.58. Damjan decides to use an exponential distribution with mean θ as a model for these data. He would like to estimate his median waiting time, m.

(a) Express m in terms of θ .

(b) Damjan decides to use the sample median, Mˆ, as his estimator.

(i) What is the asymptotic sampling distribution of Mˆ ?

(ii) What is Damjan’s estimate of m for this dataset?

(iii) Calculate a standard error for Damjan’s estimate.

(i) Show that this estimator is biased.

(ii) Let T = c be an adjusted estimator. Find c so that T is unbiased. (iii) Determine var(T).

(iv) Which of Mˆ and T is the better estimator?

(v) What is the estimate, t, based on the data above?

(vi) Calculate a standard error for this estimate.

Question 6 (10 marks) On his way to the oﬃce early every morning, Allan walks past South Lawn and counts how many students he sees there. He decides to model these counts using a Poisson distribution with pmf,

e −9 θz

Across the ﬁrst 45 days of semester, he observes on average 3.8 students per day. Robert says that he did a similar survey last year and got on average about 3.0 students per day, although he cannot remember over how many days he observed them. Allan would like to estimate θ by combining this information appropriately.

(a) Show that the gamma distribution is a conjugate prior for θ .

Note that the pdf of θ ~ Gamma(α, β) is,

f (θIα, β) = θa − 1 e −β9 , (θ > 0)

and it has mean E(θ) = α/β .

(b) Allan decides to treat Robert’s information as if it came from 10 days of sampling (i.e. as pseudodata). Determine the parameters of the prior that encode this information appropriately.

(d) Calculate the posterior mean.

Question 7 (13 marks) Ben runs a sports program for high school students. Each student that enters his program chooses one of the following three sports: volleyball, basketball and netball.

The heights of the ﬁrst 15 students, and the sports they chose, were:

Volleyball 183, 176, 170, 179

Basketball 165, 169, 171, 154, 165, 159

Netball 167, 160, 177, 173, 170

Ben runs an analysis of variance with these data. A partially complete ANOVA table from his

analysis is given below.

Source df SS MS F

Treatment (sport)

Error 38

Total 872.4

(a) Is there evidence of a relationship between the students’ heights and their choice of sport? (b) What is the sampling distribution of 2 ?

(d) Let X大 be the height of the next student who chooses volleyball. Show that, X大 - 1 ~ N /0, 、 .

(e) Calculate a 95% prediction interval for X大 .

Question 8 (12 marks) Each person has one of the genotypes A, B or C. According to the Hardy–Weinberg law in genetics, these three genotypes should occur in the population in the proportions θ 2 , 2θ(1 - θ) and (1 - θ)2 , respectively, for some θ e [0, 1]. In a sample of 600 individuals from the population, you observe 27 individuals with genotype A, 186 individuals with genotype B and 387 individuals with genotype C .

(a) Find the maximum likelihood estimate of θ .

(b) Calculate a standard error for this estimate.