闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSCI 552 Sample Final

2018

1. For the following classiﬁcation problem, design a single-layer perceptron that has zero error on training set.

C1 = {x1 = ┌ 1(1) ┐ , x2 = ┌ 2(1) ┐}

C2 = {x3 = ┌ 2-1 ┐ , x4 = ┌ 0(2) ┐}

C3 = {x5 = ┌ -21 ┐ , x6 = ┌ -12 ┐}

C4 = {x3 = ┌ -(-)1(1) ┐ , x4 = ┌ -(-)2(2) ┐}

solution: The patterns are shown in the following ﬁgure:

And tentative decision boundaries can be those in the following ﬁgure:

It is left to the student to verify that a single classify the vectors given that the four classes

C1 → t1 = ┌ C2 → t2 = ┌

┌

layer perceptron with two neurons can are modeled using vectors

-(-)1(1) ┐

-1 ┐

1-1 ┐

1(1) ┐

Sample Final DSCI 552, Instructor: Mohammad Reza Rajati May 4, 2018

and the weight vectors are columns of the following matrix

w = ┐

and the biases in the vector:

b = ┌ 0(1) ┐

The architecture of the Perceptron is:

2. Answer the following questions about the ﬁgure:

(a) What is the leave-one-out cross-validation error estimate for maximum margin

separation in the ﬁgure? (The answer is a number)

(b) True or False? We would expect the support vectors to remain the same in general

as we move from a linear kernel to higher order polynomial kernels.

(c) If each of the classes is modeled with a normal distribution and the variances are assumed equal, how many parameters are needed to describe the decision boundary for classiﬁcation?

(d) Assume that all data points outside the margin are unlabeled. Mark on the ﬁgure with a - the ﬁrst data point that has to be labeled, when implementing active learning. Also, mark on the ﬁgure with a + the ﬁrst data point that has to be labeled, when implementing self-training.

solution:

(a) Based on the ﬁgure we can see that removing any single point would not chance the

resulting maximum margin separator. Since all the points are initially classiﬁed correctly, the leave-one-out error is zero.

(b) There are no guarantees that the support vectors remain the same. The feature

vectors corresponding to polynomial kernels are non-linear functions of the original input vectors and thus the support points for maximum margin separation in the feature space can be quite diﬀerent.

(c) Modeling classes with normal densities with euqal variances creates linear decision boundaries. A linear decision boundary with two variables x1 and x2 is described by three parameters (w0 , w1 , w2 ), i.e.: g(x1 , x2 ) = w0 + w1 x1 + w2 x2 .

(d)

3. Consider the unlabeled two-dimensional data represented on the following ﬁgure.

(a) Using the two points marked as squares as initial centroids, draw (on that same

ﬁgure) the clusters obtained after one iteration of the k-means algorithm (k = 2).

(b) Does your solution change after another iteration of the k-means algorithm?

solution:

(a)

(b) No.

4. A company with headquarters in the Bay Area has two oﬃces in Los Angeles and San Diego. An employee in San Diego oﬃce is sent to the Los Angeles oﬃce the next day with probability 0.2 and stays in San Diego oﬃce with probability 0.8. An employee in Los Angeles oﬃce is sent to the San Diego oﬃce with probability 0.1 and stays in Los Angeles oﬃce with probability 0.9. A new employee is assigned to Los Angeles oﬃce with probability 0.7 and to San Diego oﬃce with probability 0.3. An employee in San Diego oﬃce works between six and eight hours per day with probability 0 .6, works more than eight hours with probability 0.3, and works less than six hours per day with probability 0.1. An employee in Los Angeles oﬃce works between six and eight hours per day with probability 0.15, works more than eight hours with probability 0.8, and works less than six hours per day with probability 0.05. A manager in the headquarters can only observe the number of hours each employee worked each day.

(a) Construct a Hidden Markov Model that models the observations of the manager

in their headquarters. Clearly show the parameters with matrices and vectors and draw a state transition graph for the model.

(b) If the manager observes the number of hours a new employee worked in the ﬁrst

three consecutive days of work to be 5, 9, 7, what is the most likely sequence of places at which the employee worked in those three days?

5olution:

(a) The ﬁgure below shows the HMM model of the problem. We used L, M, H to

show working less than six hours, between six and eight hours, and more than eight hours per day, respectively.

0.2

0.1

L M

The parameters will be:

πz0 = [ 0.7 0.3 ]

A = ┐

B = ┐

where the rows show LA and SD.

(b) The sequence of observations is L, H, M . We do not need to use the Viterbi

algorithm here. An exhaustive search is easy:

State

Probability

LA, LA, LA LA, LA, SD LA, SD, SD LA, SD, LA

5D， 5D， 5D

SD, LA, SD SD, SD, LA SD, LA, LA

0.7 x 0.9 x 0.9 x 0.05 x .8 x 0.15 = 0.003402

0.7 x 0.9 x 0.1 x 0.05 x .8 x 0.6 = 0.001512

0.7 x 0.1 x 0.8 x 0.05 x .3 x 0.6 = 0.000504

0.7 x 0.1 x 0.2 x 0.05 x .3 x 0.15 = 0.0000315

0.3 x 0.8 x 0.8 x 0.1 x .3 x 0.6 = 的.的的4s扌6

0.3 x 0.2 x 0.1 x 0.1 x .8 x 0.6 = 0.000288

0.3 x 0.8 x 0.2 x 0.1 x .3 x 0.15 = 0.000216

0.3 x 0.2 x 0.9 x 0.1 x .8 x 0.15 = 0.000648

Therefore, the most likely sequence of places is SD, SD, SD.

(c) For the maximum number of correct states (Expectation Maximization) we cal- culate the probability of each of states at each step from the table above:

State

P (LA P (SD

LHM) LHM)

0.5418345

0.458166

0.5816555

0.4183445

0.4272931

0.5727069

Hence, the answer using this method is LA, LA, SD.

5. Consider the two classes of patterns that are shown in the ﬁgure below. Design a multilayer network to distinguish these categories.

solution:

There are many diﬀerent multilayer networks that could solve this problem. The vectors in each class can be represented as:

┌ 1(1) ┐ ┌ -(-)1(1) ┐ ┌ 1-1 ┐ ┌ -11 ┐

x1 = '(') - 1 '(') x2 = '(') 1 '(') x3 = '(') 1 '(') x4 = '(') - 1 '(')

' -1 ' ' 1 ' ' -1 ' ' 1 '

We will design a network by ﬁrst noting that for the Class I vectors either the ﬁrst two elements or the last two elements will be “1” . The Class II vectors have alternating “1” and “-1” patterns. This leads to the network shown in the ﬁgure below. Note that x1 , x2 , x3 , x4 denote the four elements in each of the vectors x1 , x2 , x3 , x4 .

AND Operations

x1 x2

x3 x4

-1

The ﬁrst neuron in the ﬁrst layer tests the ﬁrst two elements of the input vector. If they are both “1” it outputs a “1”, otherwise it outputs a “-1” . The second neuron in the ﬁrst layer tests the last two elements of the input vector in the same way. Both of the neurons in the ﬁrst layer perform AND operations. The second layer of the network tests whether either of the outputs of the ﬁrst layer are “1” . It performs an OR operation. In this way, the network will output a “1” if either the ﬁrst two elements or the last two elements of the input vector are both “1” .

6. The following diagram shows a person’s state of mind and the actions they may perform due to happiness. Assume that

P (J) = 0.1

P (F) = .8

P (H|J, F) = 0.9

P (H|J, ~ F) = .8

P (H| ~ J, F) = 0.7

P (H| ~ J, ~ F) = 0.1

P (M|H) = 0.7

P (M| ~ H) = 0.2

P (D|H) = 0.3

P (D| ~ H) = 0.05

(a) Calculate the probability that the person watches a movie.

(b) Show that if the person is happy, having had food explains away ﬁnding a job.

solution:

(a) By the law of total probability we have:

P (M) = P (M|H)P (H) + P (M| ~ H)P (~ H) Therefore, we need P (H):

P (H) = P (H|J, F)P (J)P (F) + P (H| ~ J, F)P (~ J)P (F) + P (H|J, ~ F)P (J)P (~ F) + P (H| ~ J, ~ F)P (~ J)P (~ F)

= 0.9(0.1)(0.8) + (0.7)(0.9)(0.8) + (0.8)(0.1)(0.2) + (0.1)(0.9)(0.2) = 0.61 Hence

P (M) = (0.7)(0.61) + (0.2)(0.39) = 0.505

(b) We need to show P (F |H) > P (F |H, J). By Bayes’ Rule:

P (H|F)P (F)

P (H)

But by the law of total probability:

P (H|F) = P (H|J, F)P (J) + P (H| ~ J, F)P (~ J) = (0.9)(.1) + (0.7)(0.9) = 0.72

Therefore:

(0.72)(0.8)

0.61

P (H, J) P (H|J)P (J) P (H|J)

On the other hand P (F |J) = P (F), because F and J are independent. Also, by the law of total probability, P (H|J) = P (H|J, F)P (F) + P (H|J, ~ F)P (~ F) = (0.9)(0.8) + (0.8)(0.2) = 0.88.

Therefore,

P (H|J, F)P (F |J) P (H|J, F)P (F) (0.9)(0.8)