GR5241 Statistical Machine Learning Spring 2023 Homework 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistical Machine Learning GR5241
Spring 2023
Homework 4
Due: Tuesday, April 18th, 2023
Homework submission: Please submit your homework electronically through Canvas by 11:59pm on the due date. You need to submit both the pdf file and your code (either in R or Python).
Problem 1 (SVM and CV [30 points])
In this problem, we will apply a support vector machine to classify hand-written digits. You do not have to imple- ment the SVM algorithm or cross validation manually: The R library e1071 provides an implementation, see
http://cran.r-project.org/web/packages/e1071/index.html
Download the digit data set from Canvas. The zip archive contains two files:Both files are text files. Each file contains a matrix with one data point (= vector of length 256) per row. The 256-vector in each row represents a 16 × 16 image of a handwritten number. The data contains two classes—the digits 5 and 6—so they can be labeled as -1 and +1, respectively. The image on the right shows the first row, re-arranged as a 16 × 16 matrix and plotted as a gray scale image.
Homework questions:
1.i Randomly select about 20% of the data and set it aside as a test set.
1.ii Train a linear SVM with soft margin. Cross-validate the margin parameter.
1.iii Train an SVM with soft margin and RBF kernel. You will have to cross-validate both the soft-margin
parameter and the kernel bandwidth.
1.iv After you have selected parameter values for both algorithms, train each one with the parameter value you
have chosen. Then compute the misclassification rate (the proportion of misclassified data points) on the test set.
1.v Plot the cross-validation estimates of the misclassification rate. Please plot the rate as
(a) a function of the margin parameter in the linear case.
(b) a function of the margin parameter and the kernel bandwidth in the non-linear case (you are encouraged
to use heat map here).
1.vi Report the test set estimates of the misclassification rates for both cases, with the parameter values you
have selected, and compare the two results. Is a linear SVM a good choice for this data, or should we use a non-linear one?
Problem 2 (Penalized Logistic Regression Lasso [40 points])
In this problem we will apply ℓ 1 penalized logistic regression to classify hand-written digits. This is the same two class dataset from problem 1, i.e., classify digits 5 and 6 so they can be labeled as 0 and 1, respectively. In contrast to problem 1, students are required to manually code the estimation algorithm. The objective
function of interest is
Q(β0 ,β 1 , ··· ,βp ) = L(β0 ,β 1 , ··· ,βp ) + λ |βj |,
where L(β0 ,β 1 , ··· ,βp ) is the negative log-likelihood of the logistic regression model and λ > 0 is the tuning parameter. To solve this problem students must come up with an iterative algorithm that will minimize Q(β) with respect to β = (β0 β 1 ··· βp )T . Students should understand how the lasso can be estimated using cost functions other than least squares, .e.g., in this case, negative log-likelihood. The cyclic coordinate descent
algorithm from lasso linear regression is a good starting point. Another good starting point is described on page 126 of the ESL text.
Homework questions:
2.i Write down your logistic-lasso algorithm used to estimate parameters β0 ,β 1 , ··· ,βp . 2.ii Explain in detail how you came up with your logistic-lasso algorithm.
2.iii Implement your logistic-lasso algorithm on the toy dataset LogisticLasso .csv using tuning parameter
λ = .05. Display the estimated coefficients and compare your results to the glmnet() output (or similar). For reference, the glmnet() code is displayed below:
my.lasso.data <- read .csv(file= "LogisticLasso .csv")
gml .model <- glmnet(x=as .matrix(my .lasso .data[,1:10]),y=my .lasso .data[,11], family="binomial",alpha = 1,intercept=T, lambda= .05)
c(gml .model$a0,gml .model$beta[,1])
2.iv Implement your logistic-lasso algorithm on the zip code dataset from Problem 1. Choose the tuning
parameter to be λ = .005. Display the first 30 (out of 257) estimated coefficients for both your model and the glmnet() function (or similar).
Some notes:
• This is a hard problem. It’s okay if you can’t perfectly match the glmnet() output for full credit.
• Try your best!
Problem 3 (Basic Theory Related to the Lasso [20 points])
3.i Consider the univariate lasso objective function with no bias:
Q(β) = 工(yi − xi β)2 + λ|β|
Also suppose x is scaled using the formula
xi := 之 xi(2) ,
i = 1, 2, . . . ,n.
Derive a closed form expression for the lasso solution, i.e., show Q(β) is minimized at
( 之i xi yi − λ βˆ =〈 0
( 之i xi yi + λ
if 之i xi yi > λ
if − λ ≤ 之i xi yi ≤ λ
if 之i xi yi < −λ
3.ii Consider the multivariate lasso objective function with no bias:
Q(β) = Q(β1 ,β2 , . . . ,βp ) = (yi − βj xij )2 + λ |βj |
Also suppose that the jth feature xj is scaled using the formula
xij := 之 xi(2)j , i = 1, 2, . . . ,n, j = 1, 2, . . . ,p.
Solve the expression = 0 for βj . Your final answer should be
βj = Sλ ( xj(T)rj)),
where xj is the jth feature, r(j) is the partial residual, and Sλ is the soft thresholding operator. See the lecture slides for further details.
Problem 4 (Regression Trees, ISL 8.4 [10 points])
4.i Sketch the tree corresponding to the partition of the predictor space illustrated in the left-hand panel of
Figure 1. The numbers inside the boxes indicate the mean of Y within each region.
4.ii Create a diagram similar to the left-hand panel of Figure 1, using the tree illustrated in the right-hand panel
of the same figure. You should divide up the predictor space into the correct regions, and indicate the mean for each region.
Figure 1: Left: A partition of the predictor space corresponding to part a . Right: A tree corresponding to part b.
2023-04-19