闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INFS4203/7203 Data Mining

Tutorial Week 4 - k-Nearest Neighbors and Naive Bayes

Semester 2, 2023

Question 1/4: Naive Bayes: discrete feature values

Consider a small subset of Car Evaluation dataset in Table 1 as the training dataset.

1. Construct a naive Bayes classifier! (Apply the Laplacian correction if necessary)

2. Calculate the training F1-score of the classifier constructed in Question 1.1!

3. (True/False) For naive-Bayes classifiers, we assume that the distributions of the input vari- ables x1 , x2 ,..., xf are independent, where xi denotes the i-th feature for i = 1, 2,...,f, and f denotes the number of features.

Question 2/4: Naive Bayes: continuous feature values

Consider a small subset of the banknote authentication dataset in Table 2 as the training dataset.

1. Construct a naive Bayes classifier!

Here, the class-conditional densities are assumed to be Gaussian distributions whose proba- bility density function f(x) is given by

where µ and σ denote the mean and standard deviation, respectively.

2. Calculate the training F1-score of the classifier constructed in Question 2.1!

3. Recapitulate the number of parameters to estimate for the classifier constructed in Ques- tion 2.1! Then, state a general expression about the number of estimated parameters in terms of the number of classes and the number of features!

Question 3/4: k-NearestNeighbors (k-NN)

Consider a dataset in Table 3 as the training dataset.

1. Construct a k-NN classifier using the Manhattan distance function and k = 1, then calculate its training F1-score!

2. Re-construct the above k-NN with k = 3, then re-calculate its training F1-score!

3. Why do we need to use the standardized version of the Banknote Authentication dataset, i.e. Table 3 (instead of Table 2)?

Question 4/4: k-NearestNeighbors (k-NN): distance functions

1. Show the relationship between the Manhattan, Euclidean, and Minkowski distance functions!

2. Sketch unit circles in p = 1, p = 2, and p = ∞ norms (as distance functions)!

3. Observe the following Figure 1 about handwritten ZIP-code classification (taken partially from Fig 13.10 in Hastie, et.al., 2008). As can be seen, there are several rotated versions of a digit ‘3’ . These rotated versions generally have different Euclidean distance to another data point. Such different Euclidean distances however are not desirable because any rotated version of a ‘3’ is still a ‘3’ and their distances to a particular point should stay the same. Therefore, a rotation invariant distance should be exploited such that the nearest-neighbor classifier can have a reliable prediction even if digit images are rotated.

Design a distance function that can approximate rotation invariance for the handwritten digit classification!

Figure 1: Several rotated versions of a digit ‘3’. (From Fig 13.10 in Hastie, et.al., 2008)

References: (Hastie, et.al., 2008) Trevor Hastie, Robert Tibshirani, and J. H. Friedman. 2008.

The elements of statistical learning: data mining, inference, and prediction. New York: Springer. Available at https://hastie.su.domains/Papers/ESLII.pdf.

2023-09-04

k-Nearest Neighbors and Naive Bayes

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple