闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INFS4203/7203 Data Mining

Tutorial Week 3 - Decision Tree

Semester 2, 2023

Question 1/1: Decision Tree

A university hired the cyber security expert, Alice, to identify potential cyber attacks that are happening on their network. Over the course of one week, Alice recorded the most frequently visited website and the number of failed login attempts for ten users, along with a label that indicates if the user had performed any attacks on the network. The data can be found in Table 1.

Table 1: Training Data

User	Website	Login	Label
1	google.com	0	Benign
2	google.com	4	Benign
3	google.com	20	Benign
4	google.com	30	Attack
5	reddit.com	3	Benign
6	reddit.com	32	Attack
7	reddit.com	29	Attack
8	howtohack.com	3	Benign
9	howtohack.com	10	Attack
10	howtohack.com	47	Attack

To avoid susceptibility to attacks in future, the university wishes to use this data to create a classifier that is able to detect attacks that happen in the future.

1. Construct a decision tree based on information gain using the training data in Table 1 to predict whether a user on the network will perform a cyber attack given the features “Website” and “Login Attempts” .

2. Use the constructed decision tree to make predictions on the data provided in Table 2.

Table 2: Testing Data

User	Website	Login Label
11	google.com	5
12	google.com	15
13	reddit.com	15
14	howtohack.com	100

3. The ground truth labels of the testing data were Benign, Benign, Benign, Attack for Users 11, 12, 13 and 14, respectively. Compute the accuracy and F1-score of the classifier on the test set.

4. What effect does feature normalization have, if any, during the construction of a decision tree?

5. Suppose we wish to use our decision tree from Question 1 to classify a new user whose most visited website was youtube.com and they had 20 failed login attempts. Can our decision tree assign a label?

6. Suggest a modification to our decision tree that would allow it to assign a label to the user in Question 5.

7. (Extra) Go back to question 1 and try to find all possible decision trees that we could have used instead. Which one gives the best classification performance in terms of accuracy and F1-score? If we combined these into an ensemble, would we expect an increase in performance?

2023-09-04

Decision Tree

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple