Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School of Computer Science

EXAMINATION

Semester 1 - Main, 2021

COMP5318 Machine Learning and Data Mining

Question 1 [13 marks]

Select the correct answer and provide a brief explanation:

1. [2 marks] The figure below shows a training set of 8 examples described with one numerical feature x and belonging to two classes: circles and squares. A new example is shown with a blue triangle.

What will be the prediction of 3-Nearest Neighbor for the class of the new example? If

there are ties, settle them by choosing the example on the left.

Circle                  Square

Explanation:

2. [2 marks] Does this diagram correspond to Bagging or Boosting?

Bagging Boosting

Explanation:

3. [3 marks] The kernel trick in support vector machines ensures that the data will be linearly separable in the new space.

True                    False

Explanation:

4. [3 marks] Given is the following training data, where occupation, age and loan-salary- ratio are the features and outcome is the class. Two prediction models are built, Model 1 and Model 2, both consistent with the training data.

occupation

age

loan-salary- ratio

outcome

industrial

39

3.40

default

industrial

22

4.02

default

professional

30

2.70

repay

professional

27

3.32

default

professional

40

2.04

repay

professional

50

6.95

default

industrial

27

3.00

repay

industrial

33

2.60

repay

industrial

30

4.50

default

professional

45

2.78

repay

Model 1:

if loan-salary-ratio > 3.00 then outcome = default

else outcome = repay

Model 2:

if age = 50 then outcome = default

else if age = 39 then outcome = default

else if age = 30 and occupation = industrial then outcome = default    else if age = 27 and occupation = professional then outcome = default else outcome = repay

Which of these two models is more likely to generalize better on new examples?

Model 1          Model 2

Explanation:

5. [3 marks] At each step, PRISM selects the best attribute by considering all classes.

True                    False

Explanation:


Question 2 [10 marks]

Given is the following training data, where city and season are the features and price is the class:

city

season

price

Madrid

summer

high

Barcelona

spring

medium

Madrid

spring

medium

Barcelona

summer

high

Bilbao

winter

medium

Sevilla

spring

high

Sevilla

winter

medium

Bilbao

summer

medium

Use Naïve Bayes to predict the value ofprice for the following new example: city=Sevilla, season=summer. Show your calculations.

Question 3 [10 marks]

Given is the following training data, where restaurant and time are the features and price is the class:

restaurant

time

price

casual

dinner

high

casual

lunch

medium

family

lunch

medium

family

dinner

high

cafe

lunch

high

cafe

breakfast

medium

fast

breakfast

medium

fast

dinner

medium

You may use this table:

x

y

-(x/y)*log2(x/y)

x

y

-(x/y)*log2(x/y

1

2

0.50

1

7

0.40

1

3

0.53

2

7

0.52

2

3

0.39

3

7

0.52

1

4

0.5

4

7

0.46

3

4

0.31

5

7

0.35

1

5

0.46

6

7

0.19

2

5

0.53

1

8

0.38

3

5

0.44

3

8

0.53

4

5

0.26

5

8

0.42

1

6

0.43

7

8

0.17

5

6

0.22

a)What is the entropy of this data set with respect to the class?

b) What is the information gain of restaurant? Show your calculations.

Question 4 [13 marks]

1. [4 marks] There are 100 students in a computer science course. Isabella consistently outperforms the other students on the assessments during the semester and on the final exam he gets a mark of 99 while the next highest mark is 75. The range of exam marks is between

5 and 99. We would like to fit a linear regression model to the exam marks. Would Isabella’s mark cause problems? Briefly explain your answer.

2. [2 marks] List one advantage of Lasso regression compared to the standard linear regression and briefly explain your answer.

3. [2 marks] In random forest, how is the correlation among the combined decision trees reduced?

4. [5 marks] Consider the task of predicting credit card fraud in real time using a machine learning classifier. This task requires that the classifier performs thousands of predictions per second. Which algorithm is more suitable: k-nearest neighbor or logistic regression? Explain your answer.

Question 5 [14 marks]

1.  [8 marks] A company is building a classifier to predict if customers will like new products. The classifier takes as an input a vector with a very high dimensionality, has to be trained on a very large dataset and also has to