PSTAT 105: Solutions to Practice Final Problems 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
PSTAT 105: Solutions to Practice Final Problems
2022
1. Suppose that we observed the following 15 observations
120.96 134.00 189.49 245.55 339.11 353.46 355.98 366.83 445.93 479.78 486.28 537.99 558.12 729.46 748.85
We want to estimate the density at t = 450
(a) The kernel density estimator of the density at 450 for a rectangular kernel with bandwidth 50
(i.e. the kernel is constant from 400 to 500 and 0 elsewhere) is
fˆ(450) = (500) _ (400) = the number of observations in [400,500]
= = 0.002
(b) The variance of this estimator is
Var(fˆ) = ╱ 、
= = 1.067 × 10 —6
(c) The difficulty we face in estimating the bias of this estimator is that we don’t know the true density f(t).
(d) We may choose to take a bandwidth larger than 50 in order to decrease the variance of the estimator. This will result in a smoother density estimate.
2. Ms. Morgan takes a poll of her second grade class and asks each student how many children are in their family. The results were
Size of Family 1 2 3 4+
Count 24 16 6 5
Prob (1 _ p) p(1 _ p) p2 (1 _ p) p3
(a) The log-likelihood is (we treat the last column as observations censored at 3)
l(p) = (1 _ p)24p16 (1 _ p)16p12 (1 _ p)6p15
log(l(p)) = 43 log p + 46 log(1 _ p)
log(l(p)) = _
=→ pˆ = = 0.48315
(b) The probability of each column can now be calculated using the MLE.
Size of Family |
3 4+ |
Count |
24 |
Prob Expected |
╱ 、2 = 0.1206 6.15 |
where the expected values are 51 times the probability.
The statistic is
X2 = + + +
= 0.2112 + 0.8368 + 0.0038 + 0.0983 = 1.15.
(c) The test statistic would have 2 degrees of freedom because we estimated the parameter p. FYI, the P value would be
> 1 - pchisq(1.15,2)
3. We have 345 observations on the hourly log-return for a certain bond index, and we want to use a Kolmogorov–Smirnov test of the goodness of fit for a standard normal distribution.
(a) The first step in performing this test is to transform our data set returns,
Un <- pnorm(returns, mean=0, sd=1)
The purpose of this transformation is to produce uniform[0 , 1] observations. It is a sort of stan- dardization, and we use it because the KS test is designed to test the null hypothesis that the data is from a uniform distribution.
(b) Our software calculates the upper and lower statistics as
Dn(+) = 0.0599 Dn(—) = 0.1314
An appropriate approximation for the P value of this test is
-
P = 2 L(_1)j — 1 e —2j2 nDn(2) = 2 ←e —2(345)(0 . 1314)2 _ e —8(345)(0 . 1314)2 + . . . ! = 1.34 × 10 —5
(c) This P value is very small which means that we should reject the null hypothesis. There is a significant difference between the distribution of our returns and a normal(0,1) distribution. The standard normal distribution is not a good fit for our data.
4. In question 3, we used the specific null hypothesis of Ⅳ (0, 1). However, we might not really believe that the mean log return is 0, and so we decide to subtract from all of the data first. This centers the data so that it now has mean 0.
(a) We expect the KS test statistic to be smaller using the data with the mean removed because
subtracting the mean makes the mean of the result 0. The result is data that looks more like a standard normal, and therefore the distance between the distribution of the data and the standard normal is smaller.
For instance, suppose that the actually mean of the data was 100. Then we would expect the D statistic to be very large because the distribution is far away from a normal(0,1). However, if we subtract off the sample mean (which is nearly 100) then the resulting data set will be centered around 0. This will be closer to our null hypothesis which has mean 0 and variance 1.
(b) The P value from question 3 would no longer be appropriate so we decide to do a simulation
study to approximate the probability. We produce 100,000 random replications of 345 standard normal random variables, and only 4,978 of those replications had a test statistic greater than or equal to our Dn .
This is like a binomial experiment where we have 100,000 independent trials and observed X = 4, 978 successes. The estimate is
pˆ = = 0.04978
Our 95% confidence interval for p in this binomial experiment is
(pˆ(1 _ pˆ)
which is [0.04852, 0.05122]
This is an estimate of the probability under the null hypothesis that we would exceed Dn . Thus, this interval is an interval of our estimated P value.
(c) The margin of error in one of these binomial experiments is
1.96 (
If we set this error to 0.0001 then for n
0.0001 = 1.96 * (
1.962pˆ(1 _ pˆ)
0.00012
1.962 (0.04978)(1 _ 0.04978)
= = 18, 170, 000
5. We have 25 data observations, and we want to check if they are uniformly distributed between 0 and
10. 0.1 2.4 3.4 5.4 7.3 |
0.4 2.6 4.3 5.6 7.4 |
0.4 2.6 4.8 6.0 7.9 |
1.9 2.6 5.1 6.5 8.1 |
1.9 3.3 5.2 6.8 8.5 |
(a) A χ2 goodness of fit test with four intervals to test the null hypothesis that these are from a
uniform distribution has expected values that are 25/4.
0- 2.5 2.5 - 5 5- 7.5 7.5 - 10
Obs 6 7 9 3
E 6.25 6.25 6.25 6.25
X2 = + + +
= 0.01 + 0.09 + 1.21 + 1.69 = 3.00
We should compare this to a χ2 distribution with 3 degrees of freedom. χ3(2) ,0 .95 = 7.815. We accept the null hypothesis. This test does not show that there is a statistically significant deviation from the uniform distribution. We conclude that the data is reasonably close to uniform in distribution.
(b) Alternatively, we could use the Kolmogorov–Smirnov goodness of fit test for the data and hypoth-
esis from part (a). First, we transform the data so that the null hypothesis is a Uniform[0 , 1]. In this case, that just means dividing by 10. Then we calculate
D+ = max i _ Ui
i _ 1
i n
The test statistic D for this test is calculated in table 1. It is D = 0.15.
(c) Our approximate P value for this KS statistic is
-
P = 2 L(_1)j+1
j=1
e —2j2 nD2
= 2 ┌exp ┌ _2(25)(0.15)2 ┐ _ exp ┌ _8(25)(0.152 )┐ + exp ┌ _18(25)(0.152 )┐ + . . . ┐
= 2 [0.3247 _ 0.0111 + 0.00004 + . . . ] = 0.6272
This is a large P value and we would conclude that we should accept the null hypothesis. There is not a significant difference between our data and the uniform distribution.
(d) The advantage of the Kolmogorov–Smirnov test over the χ2 test in this problem is that it can detect differences over the whole interval. It does not rely on an arbitrary partition of the sample space.
6. We collected wind speeds on 20 days in November.
73.7 85.8 87.6 113.5 123.4
141.1 153.9 156.2 182.7 182.9
206.1 206.9 278.6 290.1 328.3
339.8 352.3 373.6 381.8 449.9
(a) Assuming that these are independent observations,the probability that speed is less than 300 is
# less than 300 14
n 20
(b) A 95% confidence interval for the probability estimate is
pˆ 士 1.96 !pˆ(1 _ pˆ)/n = 0.7 士 1.96 !0.7(0.3)/20
this is [0.4992, 0.9008].
(c) Using the kernel
,¥ + _30 < t < 0
K30 (t) = │ _ 0 < t < 30
¥『0 elsewhere
he estimate of the density of the data at 300 is
20
L K(Xi _ 300) = [K(278.6 _ 300) + K(290.1 _ 300) + K(328.3 _ 300)]
i=1
= [0.00955 + 0.022333 + 0.001888]
= 0.001689
(d) If this kernel K30 (t) has a bandwidth of 30, then the kernel function K90 (t) that has a bandwidth of 90 is
Kh (t) = K1 ╱ 、
so
,¥ + _90 < t < 0
K90 (t) = │ _ 0 < t < 90
¥『0 elsewhere
(e) The advantage of using the wider bandwidth kernel is that it includes more observations and
therefore will have a smaller variance.
(f) In order to estimate the standard deviation of this kernel estimator, I decided to try to use a
bootstrap methodology.
> sds <- rep(-999,8)
> for( j in 1:8) {
+
+
+
+
boot.wind <- matrix(sample(wind.speed,size=20000,replace=TRUE),ncol=20) kern.bw <- (1/30 - abs(boot.wind-300)/900)*(abs(boot.wind-300)<30) hat.f <- rowMeans(kern.bw)
sds[j] <- sd(hat.f)
+ }
> signif(sds,4)
[1] 0.001104 0.001158 0.001137 0.001154 0.001133 0.001200 0.001155 0.001171 > mean(sds)
[1] 0.001151571
> sd(sds)
[1] 2.817011e-05
Our bootstrap estimator of sd(fˆ) is 0.01151571 from the code.
An approximate margin of error for this estimator can be computed from the batches.
s 2.817 × 10 —5
7. The following data was collected on the lifetimes of rubber gaskets in marine applications.
10 12 37 55 59 63 68 77
109 111 126 158 162 163 188 197
I want to test whether this data is from an Exponential distribution with mean 100. P0 {X < x} = 1 _ e —x/100 .
(a) If we are going to use a χ2 test, first we need to divide this data into 3 appropriate intervals. One
reasonable approach is to take the first interval to go from 0 to
e —x/100 = =→ x = 100 log 1.5 = 40.5.
The probability P0 {X < 40.5} = 1/3.
The next interval has it’s upper bound as x = 100 log 3 = 109.9. This interval has probability P{40.5 < X < 109.9} = 2/3 _ 1/3 = 1/3.
Then the third interval is anything greater than 109 .9. It would also have probability 1/3.
2022-03-14