闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Primary Exam Semester 2, 2021

Introduction to Statistical Machine Learning

COMP SCI 3314

Overview of Machine Learning, etc.

Question 1

(a) Please judge the correctness of the following statement. If it is not

correct, please give reasons:

True or False To perform cross-validation, we can use all training data to train the model and a subset of training data to assess how the results of learned model will generalise to an unseen data set. [4 marks]

(b) Which algorithm(s) of the following can be used as supervised ma-

chine learning algorithm(s): (1) Kernel Principal Component Anal- ysis. (2) k-means clustering. (3) Support Vector Machine Classiﬁer (4) K-nearest neighbour classiﬁer (5) Ridge Regression. [4 marks]

(c) Describe the difference between a generative classiﬁer and a dis- criminative classiﬁer (3 marks) and please give one example of generative classiﬁer and one example of discriminative classiﬁer (2 marks). [5 marks]

(d) Please judge the correctness of the following statement. If it is not correct, please give reasons:

True or False Increasing k in a k-nearest neighbour classiﬁer can lead to a smoother decision boundary. [3 marks]

[Total for Question 1: 16 marks]

Support Vector Machines (SVMs)

Question 2

Let {(xi , yi )} be the training data for a binary classiﬁcation problem, where xi e Rd and yi e {_1, 1}. Let w e Rd be the parameter vector, b e R be the offset, ξi be the slack variable for i = 1, ..., n.

Here the notation (p, q) = p . q calculates the inner product of two vectors.

(a) What is wrong with the following primal form of the soft margin

SVMs?

i=1

s.t. yi ((xi , w) + b) > 1 _ ξi , i = 1, . . . , n,

ξi < 0, i = 1, . . . , n.

[3 marks]

(b) The dual form of the hard margin SVMs is given below.

α 2

i=1 i,j

s.t. αi > 0, i = 1, . . . , n

αiyi = 0

i=1

Answer the following two questions: (1) Express w using the dual variables and the training data (3 marks). (2) Describe how to ﬁnd support vectors by only using dual variable αi (3 marks). [6 marks]

(c) Assume that we have a modiﬁed version of the standard Support Vector Machines, which has the following primal formulation:

i=1

s.t. yi ((xi , w) + b) > 1 _ ξi ,

i = 1, . . . , n,

i. Explain the reason why, with or without the constraint ξi > 0, the above problem will have the same optimal solution. (4 points)

ii. Derive the dual formulation of the above primal problem. (8 points) [12 marks]

[Total for Question 2: 21 marks]

Ensemble Learning and Regression

Question 3

(a) True or False If a classiﬁer can easily achieve 100% training accu-

racy when trained on the training set, its classiﬁcation accuracy on the test set can be further boosted by the Adaboost algorithm. [3 marks]

(b) Please describe how to use Bagging to learn an ensemble of clas-

siﬁers (1 mark) and why classiﬁers trained from the Bagging algo- rithm tend to be different (3 marks). [4 marks]

[Total for Question 3: 12 marks]

Clustering and Kernels

Question 4

(a) True or False The loss function of k-means algorithm will mono-

tonically decrease with the number of iterations. However, for ker- nel k-means algorithm, this is not necessarily true. [3 marks]

(b) Please describe at least two advantages of Gaussian-Mixture-Model-

based clustering over the k-means clustering algorithm. [4 marks]

(c) Suppose we have two kernels K1 (., .) and K2 (., .) such that there are an implicit high-dimensional feature maps Φj : Rd - RD that satisfy A x, z e Rd , Kj (x, z) = Φj (x) . Φj (z), j = 1, 2, where Φj (x) . Φj (z) = (Φj (x), Φj (z)) = Φj (x)i Φj (z)i is the dot prod- uct (a.k.a. inner product) in the D-dimensional space.

Note that here Φj (x)i means the i-th dimension of the vector Φj (x).

Deﬁne K(x, z) = K1 (x, z)K2 (x, z). Will K(x, z) be a valid kernel function?

If the answer is yes, prove that. If the answer is no, explain why. [7 marks]

[Total for Question 4: 14 marks]

Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA)

Question 5

In this problem two linear dimenionality reduction methods will be

discussed. They are principal component analysis (PCA) and linear

discriminant analysis (LDA).

(a) LDA reduces the dimensionality given labels by maximising the overall interclass variance relative to intraclass variance. Plot the directions (roughly) of the ﬁrst PCA and LDA components in the following ﬁgure respectively. In the ﬁgure, squares and circles rep- resent two different classes of data points.

[Total for Question 5: 18 marks]

Neural Networks and Semi-supervised Learning

Question 6

(a) Please describe at least one advantage of deep learning over tra-

ditional machine learning approaches, for example, support vector machines. [3 marks]

(b) Training a convolutional neural network for recognising handwrit- ing digits, one ﬁnds that performance on the training set is very

good while the performance on the validation set is unacceptably low. A reasonable ﬁx might be to:

(Select the answer (answers) that could be the solution(s), and brieﬂy explain why):

(A) Reduce the training set size, and increase the validation set size.

(B) Increase the number of layers and neurons.

(D) Train longer with more iterations [2 marks]

(c) Please brieﬂy describe one semi-supervised learning approach (1 mark) and why it can build a stronger model by using unlabelled data (2 marks). [3 marks]

(d) We use the following convolutional neural network to classiﬁy a set of 32x32 color images, that is, the input size is 32x32x3:

1) Layer 1: convolutional layer with the ReLU nonlinear activiation function, 100 5x5 ﬁlters with stride 2.

2) Layer 2: 2x2 max-pooling layer

3) Layer 3: convolutional layer with the ReLU nonlinear activiation function, 50 3x3 ﬁlters with stride 1.

4) Layer 4: 2x2 max-pooling layer

5) Layer 5: fully-connected layer

6) Layer 6: classﬁcation layer

How many parameters are in the ﬁrst layer (4 marks), the second layer (3 marks) and the third layer (assume bias term is used) (4 marks)? [11 marks]

[Total for Question 6: 19 marks]

2022-11-07

Java

物理(Physical)