闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSME6650

Assignment 02

Submission Deadline 20 November 2022 23:59

1. Consider the training data in the following table for a binary classification problem with the “Purchased VIP Products” attribute as the label. That is, we want to predict whether a customer will purchase VIP products.

Customer ID	Property Owner	Car Owner	Average Spending Per Month	Purchased VIP Products
0003	yes	yes	11,000	yes
0012	no	yes	15,000	no
0017	yes	no	15,000	no
0026	no	no	18,000	no
0021	no	yes	13,000	no
0029	yes	yes	16,000	yes
0030	yes	no	17,000	yes
0031	no	no	14,000	yes
0081	no	yes	17,000	no

(a) Compute the entropy of this training data set (with respect to the label). (b) Which attribute (“Property Owner”, “Car Owner”, “Average Spending Per Month”)

provides the best split according to information gain? [Note that “Average Spending Per Month” is a numeric attribute].

(c) John and David, the two data scientists in your team, used different approaches to construct the decision tree. John selected the split by using information gain and David by using gain ratio. The marketing manager asked you what difference is between these two approaches, and whether one is more reliable than the others. Provide a brief explanation to the marketing manager.

2. Consider the decision tree below.

Assume that they are generated from a dataset that contains 16 binary attribute and 3 classes, C1 , C2 , and C3 . Compute the total description length of each decision tree according to the following.

The total cost description length of a tree is given by

Cost(tTee, data) = Cost(tTee) +Cost(data|tTee) Each internal node of the tree is encoded by the ID of the splitting attribute. If

there are m attributes, the cost of encoding each attribute is log2 m bits.

Each leaf is encoded using the ID of the class it is associated with. If there are k

classes, the cost of encoding a class is log2 k bits.

Cost(tTee) is the cost of encoding all the nodes in the tree. To simplify the

computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node.

Cost(data|tTee) is encoded using the classification errors the tree commits on

the training set. Each error is encoded by log2 n bits, where n is the total number of training instances.

The minimum description length (MDL) principle is a model selection principle where the shortest description of the data is the best model. Which decision tree is better based on the MDL.

3. Consider the set of one-dimensional data points {6, 12, 18, 24, 28, 42, 48}

(a) For each of the following sets of initial centroids, create two clusters by assigning

each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroids.

(i) {18, 45}

(ii) {15, 40}

(b) Do both set of centroids represent converged solutions?

(d) Which technique, K-means (take the result with the lowest squared error) or MIN, seems to produce the most natural (in terms of contiguous) clustering? Explain the behavior.

4. Use the banking data set – marketing targets (available here) to build a model to predict whether a customer is likely to convert (i.e. to subscribe to a term deposit). Please include the following in your submission.

(a) A briefly explanation of your model building process

If you use Python or R, please include the source code in your submission. If you use Excel or Weka, please provide a brief description on how you obtain

the results.

(b) Evaluation of the performance of your model.

5. Recall that for neural networks with sigmoid activation functions of the form G(z) = 1+ exp(1)(一z)

the value of neuron k at layer l is computed as

vk(l) = G (w0 + x wk,i v i(l) -1)

Design neural networks that compute the following Boolean functions, where X1 and X2 are Boolean inputs and we will treat the final output y as 1 if the output of the sigmoid unit is greater 0.5 and 0 otherwise.

(a) Implement the logical OR function y = X1 ∨X2 with a single unit with three weights and two inputs. Explain whether you can implement the logical AND function with a single unit.

(b) Implement the XOR function with the smallest number of units and draw your network showing all the weights.

2022-11-15

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple