Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Statistical Machine Learning GR5241

Spring 2023

Homework 4

Due: Tuesday, April 18th, 2023

Homework submission:  Please submit your homework electronically through Canvas by 11:59pm on the due date. You need to submit both the pdf file and your code (either in R or Python).

Problem 1 (SVM and CV [30 points])

In this problem, we will apply a support vector machine to classify hand-written digits. You do not have to imple- ment the SVM algorithm or cross validation manually: The R library e1071 provides an implementation, see

http://cran.r-project.org/web/packages/e1071/index.html

Download the digit data set from Canvas.  The zip archive contains two files:Both files are text files.  Each file contains a matrix with one data point (= vector of length 256) per row. The 256-vector in each row represents a 16 × 16 image of a handwritten number.  The data contains two classes—the digits 5 and 6—so they can be labeled as -1 and +1, respectively.  The image on the right shows the first row, re-arranged as a 16 × 16 matrix and plotted as a gray scale image.

Homework questions:

1.i  Randomly select about 20% of the data and set it aside as a test set.

1.ii Train a linear SVM with soft margin. Cross-validate the margin parameter.

1.iii Train an SVM with soft margin and RBF kernel.  You will have to cross-validate both the soft-margin

parameter and the kernel bandwidth.

1.iv After you have selected parameter values for both algorithms, train each one with the parameter value you

have chosen. Then compute the misclassification rate (the proportion of misclassified data points) on the test set.

1.v  Plot the cross-validation estimates of the misclassification rate. Please plot the rate as

(a) a function of the margin parameter in the linear case.

(b) a function of the margin parameter and the kernel bandwidth in the non-linear case (you are encouraged

to use heat map here).

1.vi  Report the test set estimates of the misclassification rates for both cases, with the parameter values you

have selected, and compare the two results. Is a linear SVM a good choice for this data, or should we use a non-linear one?

Problem 2 (Penalized Logistic Regression Lasso [40 points])

In this problem we will apply ℓ 1   penalized logistic regression to classify hand-written digits.  This is the same two class dataset from problem 1, i.e., classify digits 5 and 6 so they can be labeled as 0 and 1, respectively.  In contrast to problem 1, students are required to manually code the estimation algorithm.  The objective

function of interest is

Q(β0 ,β 1 , ··· ,βp ) = L(β0 ,β 1 , ··· ,βp ) + λ  |βj |,

where L(β0 ,β 1 , ··· ,βp ) is the negative log-likelihood of the logistic regression model and λ > 0 is the tuning parameter. To solve this problem students must come up with an iterative algorithm that will minimize Q(β) with respect to β =  (β0     β 1      ···    βp )T .  Students should understand how the lasso can be estimated using cost functions other than least squares, .e.g., in this case, negative log-likelihood.  The cyclic coordinate descent

algorithm from lasso linear regression is a good starting point. Another good starting point is described on page 126 of the ESL text.

Homework questions:

2.i Write down your logistic-lasso algorithm used to estimate parameters β0 ,β 1 , ··· ,βp . 2.ii  Explain in detail how you came up with your logistic-lasso algorithm.

2.iii  Implement your logistic-lasso algorithm on the toy dataset LogisticLasso .csv using tuning parameter

λ = .05. Display the estimated coefficients and compare your results to the glmnet() output (or similar). For reference, the glmnet() code is displayed below:

my.lasso.data  <-  read .csv(file=  "LogisticLasso .csv")

gml .model  <-  glmnet(x=as .matrix(my .lasso .data[,1:10]),y=my .lasso .data[,11], family="binomial",alpha  =  1,intercept=T,  lambda= .05)

c(gml .model$a0,gml .model$beta[,1])

2.iv  Implement your  logistic-lasso algorithm on the zip code dataset from  Problem  1.   Choose the tuning

parameter to be λ = .005. Display the first 30 (out of 257) estimated coefficients for both your model and the glmnet() function (or similar).

Some notes:

• This is a hard problem. It’s okay if you can’t perfectly match the glmnet() output for full credit.

• Try your best!

Problem 3 (Basic Theory Related to the Lasso [20 points])

3.i  Consider the univariate lasso objective function with no bias:

Q(β) =  (yi xi β)2 + λ|β|

Also suppose x is scaled using the formula

xi  :=   xi(2) ,

i = 1, 2, . . . ,n.

Derive a closed form expression for the lasso solution, i.e., show Q(β) is minimized at

(   i xi yi λ βˆ =  0

(  i xi yi + λ

if        i xi yi  > λ

if        λ  i xi yi  λ

if        i xi yi  < λ

3.ii  Consider the multivariate lasso objective function with no bias:

Q(β) = Q(β1 ,β2 , . . . ,βp ) =   (yi βj xij )2 + λ  |βj |

Also suppose that the jth  feature xj  is scaled using the formula

xij  :=   xi(2)j ,   i = 1, 2, . . . ,n,  j = 1, 2, . . . ,p.

Solve the expression  = 0 for βj . Your final answer should be

βj  = Sλ ( xj(T)rj)),

where xj  is the jth  feature, r(j)  is the partial residual, and Sλ  is the soft thresholding operator.  See the lecture slides for further details.

Problem 4 (Regression Trees, ISL 8.4 [10 points])

4.i Sketch the tree corresponding to the partition of the predictor space illustrated in the left-hand panel of

Figure 1. The numbers inside the boxes indicate the mean of Y within each region.

4.ii  Create a diagram similar to the left-hand panel of Figure 1, using the tree illustrated in the right-hand panel

of the same figure. You should divide up the predictor space into the correct regions, and indicate the mean for each region.

 

Figure 1: Left: A partition of the predictor space corresponding to part a . Right: A tree corresponding to part b.