DSME6650 Assignment 02
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DSME6650
Assignment 02
Submission Deadline 20 November 2022 23:59
1. Consider the training data in the following table for a binary classification problem with the “Purchased VIP Products” attribute as the label. That is, we want to predict whether a customer will purchase VIP products.
Customer ID |
Property Owner |
Car Owner |
Average Spending Per Month |
Purchased VIP Products |
0003 |
yes |
yes |
11,000 |
yes |
0012 |
no |
yes |
15,000 |
no |
0017 |
yes |
no |
15,000 |
no |
0026 |
no |
no |
18,000 |
no |
0021 |
no |
yes |
13,000 |
no |
0029 |
yes |
yes |
16,000 |
yes |
0030 |
yes |
no |
17,000 |
yes |
0031 |
no |
no |
14,000 |
yes |
0081 |
no |
yes |
17,000 |
no |
(a) Compute the entropy of this training data set (with respect to the label). (b) Which attribute (“Property Owner”, “Car Owner”, “Average Spending Per Month”)
provides the best split according to information gain? [Note that “Average Spending Per Month” is a numeric attribute].
(c) John and David, the two data scientists in your team, used different approaches to construct the decision tree. John selected the split by using information gain and David by using gain ratio. The marketing manager asked you what difference is between these two approaches, and whether one is more reliable than the others. Provide a brief explanation to the marketing manager.
2. Consider the decision tree below.
Assume that they are generated from a dataset that contains 16 binary attribute and 3 classes, C1 , C2 , and C3 . Compute the total description length of each decision tree according to the following.
The total cost description length of a tree is given by
Cost(tTee, data) = Cost(tTee) +Cost(data|tTee) Each internal node of the tree is encoded by the ID of the splitting attribute. If
there are m attributes, the cost of encoding each attribute is log2 m bits.
Each leaf is encoded using the ID of the class it is associated with. If there are k
classes, the cost of encoding a class is log2 k bits.
Cost(tTee) is the cost of encoding all the nodes in the tree. To simplify the
computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node.
Cost(data|tTee) is encoded using the classification errors the tree commits on
the training set. Each error is encoded by log2 n bits, where n is the total number of training instances.
The minimum description length (MDL) principle is a model selection principle where the shortest description of the data is the best model. Which decision tree is better based on the MDL.
3. Consider the set of one-dimensional data points {6, 12, 18, 24, 28, 42, 48}
(a) For each of the following sets of initial centroids, create two clusters by assigning
each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroids.
(i) {18, 45}
(ii) {15, 40}
(b) Do both set of centroids represent converged solutions?
(c) What are the two clusters produced by MIN?
(d) Which technique, K-means (take the result with the lowest squared error) or MIN, seems to produce the most natural (in terms of contiguous) clustering? Explain the behavior.
4. Use the banking data set – marketing targets (available here) to build a model to predict whether a customer is likely to convert (i.e. to subscribe to a term deposit). Please include the following in your submission.
(a) A briefly explanation of your model building process
If you use Python or R, please include the source code in your submission. If you use Excel or Weka, please provide a brief description on how you obtain
the results.
(b) Evaluation of the performance of your model.
5. Recall that for neural networks with sigmoid activation functions of the form G(z) = 1+ exp(1)(一z)
the value of neuron k at layer l is computed as
vk(l) = G (w0 + x wk,i v i(l) -1)
Design neural networks that compute the following Boolean functions, where X1 and X2 are Boolean inputs and we will treat the final output y as 1 if the output of the sigmoid unit is greater 0.5 and 0 otherwise.
(a) Implement the logical OR function y = X1 ∨X2 with a single unit with three weights and two inputs. Explain whether you can implement the logical AND function with a single unit.
(b) Implement the XOR function with the smallest number of units and draw your network showing all the weights.
2022-11-15