INFS4203/7203 Data Mining Tutorial Week 3 Semester 2, 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
INFS4203/7203 Data Mining
Tutorial Week 3 - Decision Tree
Semester 2, 2023
Question 1/1: Decision Tree
A university hired the cyber security expert, Alice, to identify potential cyber attacks that are happening on their network. Over the course of one week, Alice recorded the most frequently visited website and the number of failed login attempts for ten users, along with a label that indicates if the user had performed any attacks on the network. The data can be found in Table 1.
User |
Website |
Login |
Label |
1 |
google.com |
0 |
Benign |
2 |
google.com |
4 |
Benign |
3 |
google.com |
20 |
Benign |
4 |
google.com |
30 |
Attack |
5 |
reddit.com |
3 |
Benign |
6 |
reddit.com |
32 |
Attack |
7 |
reddit.com |
29 |
Attack |
8 |
howtohack.com |
3 |
Benign |
9 |
howtohack.com |
10 |
Attack |
10 |
howtohack.com |
47 |
Attack |
To avoid susceptibility to attacks in future, the university wishes to use this data to create a classifier that is able to detect attacks that happen in the future.
1. Construct a decision tree based on information gain using the training data in Table 1 to predict whether a user on the network will perform a cyber attack given the features “Website” and “Login Attempts” .
2. Use the constructed decision tree to make predictions on the data provided in Table 2.
User |
Website |
Login Label |
11 |
google.com |
5 |
12 |
google.com |
15 |
13 |
reddit.com |
15 |
14 |
howtohack.com |
100 |
3. The ground truth labels of the testing data were Benign, Benign, Benign, Attack for Users 11, 12, 13 and 14, respectively. Compute the accuracy and F1-score of the classifier on the test set.
4. What effect does feature normalization have, if any, during the construction of a decision tree?
5. Suppose we wish to use our decision tree from Question 1 to classify a new user whose most visited website was youtube.com and they had 20 failed login attempts. Can our decision tree assign a label?
6. Suggest a modification to our decision tree that would allow it to assign a label to the user in Question 5.
7. (Extra) Go back to question 1 and try to find all possible decision trees that we could have used instead. Which one gives the best classification performance in terms of accuracy and F1-score? If we combined these into an ensemble, would we expect an increase in performance?
2023-09-04
Decision Tree