Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INFS4203/7203 Data Mining

Tutorial Week 5 - Clustering and k-Means

Semester 2, 2023

Question 1/3: Clustering

1. Discuss some real-world applications of clustering in Data Mining.

2. Differentiate between supervised and unsupervised learning.

Question 2/3: k-Means

1. Assume k-means is used with k=3 to cluster the following 2-dimensional 6 points:

Use the Manhattan distance metric to identify clusters and compare the formed clusters and number of iterations with your peers.

2. How can Sum of Squared Error (SSE) be used in clustering? Taking the results obtained in Q2.1 calculate SSE using Manhattan distance.

3. List some limitations of k-means.

4. What is the difference between the Manhattan Distance and Euclidean Distance in Clustering?

5. Apply k-means to cluster the following points: 18, 25, 28, 35, based on Manhattan distance, and calculate the optimal value of k through SSE utilizing the elbow method.

6. (True/False) The smaller the SSE the better the clustering performance.

(Extra) Question 3/3: Clustering: soft clustering and hard clustering

1. Briefly describe the difference between fuzzy clustering (soft clustering) and hard clustering.