闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Machine Learning and Intelligent Data Analysis

Main Summer Examinations 2021

Machine Learning and Intelligent Data Analysis

Question 1 Dimensionality Reduction

(a) Explain what is meant by “dimensionality reduction” and why it is sometimes nec-

essary. [4 marks]

(b) Consider the following dataset of four sample points

Calculate the principal components of this dataset. Show all of your working.

[6 marks]

(c) What does principal component analysis (PCA) tell you about the nature of a multi- variate dataset? Explain how it can be used for dimensionality reduction? [4 marks]

(d) What are the limitations of PCA and what other dimensionality reduction technique may be used instead? [2 marks]

(e) You are given a dataset consisting of 100 measurements, each of which is has 10

variables. The eigenvalues of the covariance matrix are shown in the following table:

Eigenvalue number	1	2	3	4	5	6	7	8	9	10
Eigenvalue	1382.0	508.4	187.0	68.8	25.3	9.3	3.4	1.3	0.46	0.17

What can you say about the underlying nature of this dataset? [4 marks]

Question 2 Classiﬁcation

(a) Consider the Soft Margin Support Vector Machine learnt in Lecture 4e. Consider also that C = 100 and that we are adopting a linear kernel, i.e., k(x(i) , x(j)) = x(i)T x(j) . Assume an illustrative binary classiﬁcation problem with the following training ex- amples:

x(1) = (0 .3, 0 .3)T , y(1) = 1

x(2) = (0 .6, 0 .6)T , y(2) = 1

x(3) = (0 .6, 0 .3)T , y(3) = -1

x(4) = (0 .9, 0 .6)T , y(4) = -1

Which of the Lagrange multipliers below is(are) a plausible solution(s) for this prob- lem? Justify your answer.

(i) a(1) = 0, a(2) = 2, a(3) = 2, a(4) = 10

(ii) a(1) = 0, a(2) = 44, a(3) = 22, a(4) = 22

(iii) a(1) = 0, a(2) = 200, a(3) = 100, a(4) = 100

[6 marks]

(b) Consider a binary classiﬁcation problem where around 5% of the training examples

are likely to have their labels incorrectly assigned (i.e., assigned as 0 when the true label was 1, and vice-versa). Which value of k for k-Nearest Neighbours is likely to be better suited for this problem: k = 1 or k = 3? Justify your answer. [6 marks]

(c) Consider a binary classiﬁcation problem where you wish to predict whether a piece of machinery is likely to contain a defect. For this problem, 0.5% of the training examples belong to the defective class, whereas 99.5% belong to the non-defective class. When adopting Na¨ıve Bayes for this problem, the non-defective class may almost always be the predicted class, even when the true class is the defective class. Explain why and propose a method to alleviate this issue. [8 marks]

Question 3 Document Analysis

(a) In a small universe of ﬁve web pages, one has a PageRank of 0.4. What does this

tell us about this page? [2 marks]

(b) Compare and contrast the TF-IDF and word2vec approaches to document vectori-

sation. You should explain the essential principles of each method, and highlight their respective advantages and disadvantages. [8 marks]

(c) One possible approach to searching a large linked set of documents is to combine a measure of document similarity such as TF-IDF similarity with a measure of a page’s importance such as that provided by PageRank. Suggest how this could be done.

[10 marks]