MTH6991/MTH791U/MTH791P Computational Statistics with R Exercise Sheet 4 Spring 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MTH6991/MTH791U/MTH791P
Computational Statistics with R
Exercise Sheet 4
Spring 2023
Problems for handing in
1. (30 marks)
This question uses a dataset on QMPlus, which is not the same as the dataset for the previous three exercise sheets. For each student, there should be a file called “exer- cise4 XYZ.txt”, where XYZ is your ID number (you need to be logged in to QMPlus). If you cannot see a file, please send me an email.
Hand in: along with your answers, also include the graphs, but apart from that don’t copy any R output. Within R, right-clicking on a graph gives the options of saving it to disk or copying it (e.g. to paste into a Word document).
The dataset contains one column, called x. Using R, generate three kernel density estimates (KDE) using bandwidths of h = 0.2, 1 and 2, and the Gaussian kernel. Plot each of these KDEs.
(a) What is the effect of changing h on the appearance of the graphs? What is the
main feature of the data that these plots show?
(b) What is the bandwidth that R calculates (still with the Gaussian kernel) if you do
not specify a bandwidth?
(c) Also generate and plot a KDE using a bandwidth of 1 and the rectangular kernel. Which produces a smoother curve (for h = 1), the rectangular or Gaussian kernel?
2. (20 marks)
Let n > 1 denote the size of a sample drawn from an unknown distribution F with pdf f . Suppose you are interested in using a kernel density estimator (KDE) to estimate f using an Epanechnikov kernel. Which one(s) of the following bandwidth specifications would you think is an advisable choice to make? Motivate your answer.
hn = n−2, hn =
, hn = n1/2, hn =
.
3) Without using R, just with pen and paper, calculate the histogram estimator fˆH (y) of the probability density function (pdf) using the following data:
0.5, 4.9, 6.5, 4.4, 7.5, 6.9, 1.2, 6.7, 5.8, 4.7
Use bins (intervals) with boundaries at 0, 2, 4, 6, 8.
So you would need to fill in the ? in the following:
![]()
?,
?,
fˆH (y) = '〈'?(?)
'
( ?,
y ≤ 0
0 < y ≤ 2
2 < y ≤ 4
4 < y ≤ 6
6 < y ≤ 8
y > 8
Now use R to draw a histogram with these data, using the same intervals, and check that the probability density function estimate is the same as you calculated by hand.
4) For a general kernel function K (which is by definition a pdf), if σ > 0 is the standard deviation of this pdf, then we can define the rescaled kernel K ∗ by K ∗ (x) = σK(σx). Show that K ∗ is a pdf, and that it has standard deviation 1.
5) Using the same data as question 3), without using R, calculate the kernel density esti- mate fˆn,h(y) using the triangular kernel, and with bandwidth h = 1, for the values of y = 0, 1, 3 and 4.
6) In R, simulate a sample of size 1000 from a beta distribution with parameters α = 1.3 and β = 3.0, which can be done with the command rbeta. Estimate a kernel density with Gaussian kernel and bandwidth 0.1. Plot the kernel density estimate and the true exponential pdf on the same graph. The latter is given by
f(y;α,β) =
yα−1(1 − y)β−1, y ∈ (0, 1), α > 0, β > 0.
You could use the command curve(dbeta(x, shape1= . . . , shape2= . . .), add=TRUE ) to add this true pdf to an existing graph, such as the plot of the KDE.
How do the true and estimated pdfs differ, and what feature of the true pdf do you think might cause this?
2023-03-03