闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA6529/20

STATISTICAL LEARNING

SECTION A

The questions in this section will each be marked out of 10. Candidates may attempt all SIX questions but are advised that they cannot obtain a total of more than FIFTY MARKS on this section.

1. Two measurements were collected on each of 36 ﬂea-beetles; 18 of the beetles were from a species called Chactomcnema concina and the other 18 were from another species called Chactomcnema heikertingeri. The ﬁrst variable, x1 , consisted of the sum of widths (in micrometers) of the ﬁrst joints of the ﬁrst two tarsi (”feet”); and the second variable, x2 , consisted of the corresponding sum for the second joints. The sample means of the two species are x1 = \ and x2 = \, respectively; and the pooled sample covariance matrix and its inverse are given by

S = ╱ 64(165)835(.260)

\ ,

S − 1 = \ .

(a) It is of interest to know whether or not the population means of the two species are

diﬀerent. Use Hotelling’s T2 to test the null hypothesis of no diﬀerence.

(b) What are the assumptions needed to use the test in part (a)?

[ 7 marks ]

[ 3 marks ]

2. (a) A sample of customers were asked to score movies and each customer’s average scores for three diﬀerent genres of movies (Action, Comedy and Romance) were calculated. The sample correlation matrix was

Action Comedy Romance

Action 1 0.63 -0.58

Comedy 0.63 1 -0.34

Romance -0.58 -0.34 1

Calculate the partial correlation coeﬃcients between Comedy and Romance given Action. What does this indicate about the relationship between the average scores for these genres?

[ 5 marks ]

(b) Suppose that we have three random variables X1 , X2 and X3. Explain how the partial correlation coeﬃcient between X1 and X2 given X3 can be calculated using linear regression. [ 5 marks ]

3. Data were collected on 406 cars. We will consider ﬁve variables: Engine displacement, horsepower, weight, acceleration, miles per gallon (MPG). The variables were divided into two groups: physical characteristics,

X 1 = (Displacement, Horsepower, Weight),

and performance characteristics,

X 2 = (Acceleration, MPG).

The data were analysed using canonical correlation analysis. The ﬁrst two canonical correla- tion vectors were

a1 = ( −0.262, 0.777, −0.021),

and

b1 = ( −0.460, 0.715),

a2 = (0.500, 1.575, −2.274)

b2 = ( − 1.004, 0.841)

The canonical correlations were 0.88 and 0.63. Interpret these results.

4. Consider the graph:

Answer the following questions:

(a) Is the graph complete? Justify your answer.

(b) Are (X1, X2 , X3 ) and (X3, X5 , X6 ) paths? Justify your answer.

[ 10 marks ]

[ 2 marks ]

(c) List the set of maximal cliques in the graph and use them to factorize the joint probability distribution of (X1, X2 , X3 , X4 , X5 , X6 ). [ 4 marks ]

(d) Provide the deﬁnition of decomposable graph. [ 2 marks ]

5. Answer the following questions about mixture models:

(a) Provide the deﬁnition of mixture models. [ 2 marks ] (b) Motivate the use of mixture models by using some examples. [ 2 marks ]

(d) Explain how to complete the data in order to perform the EM algorithm. [ 3 marks ]

6. (a) Explain what is a distance matrix and a similarity matrix. (b) Explain how to transform a similarity matrix into a distance matrix.

SECTION B

These questions will each be marked out of 25. Candidates may not attempt more than TWO of the THREE questions.

7. (a) Suppose we have a dataset containing n observations and each of the observations is a p-dimensional vector xi = [xi1, . . . , xip]T (i = 1, . . . , n). Describe how to obtain loadings and scores in principal component analysis for this dataset. Explain the geometric meaning of the loadings and the scores. [ 10 marks ]

(b) Suppose we denote the covariance matrix of the original data as SX. The eigenvectors of SX are columns of the matrix A and the eigenvalues of SX are the diagonal elements of the diagonal matrix Λ. Show that the principal component scores are uncorrelated. [ 5 marks ]

(c) For each of the 50 states in the United States, the dataset contains the number of arrests per 100,000 residents for each of three crimes: Assault, Murder, and Rape. We also record UrbanPop (the percent of the population in each state living in urban areas). The loadings of the ﬁrst two principal components are

PC1

PC2

Murder Assault UrbanPop Rape

0.54

0.58

0.28

0.54

-0.42

-0.19

0.87

0.17

The eigenvalues of the correlation matrix are 2.48, 0.99, 0.36 and 0.17.

(i) Provide an interpretation of the ﬁrst two principal components. [ 5 marks ]

(ii) Draw a scree plot and discuss the number of principal components that you would

use. [ 5 marks ]

8. In a study on diabetes, we aim to identify people with high risk of diabetes. The patient records containing eight variables were obtained for two classes of people: 45 records for healthy individuals (class 0) and 25 records for individuals with a high risk of diabetes (class 1).

(a) Assume that the distributions of the two classes are two multivariate normal distributions with the same covariance matrix, MN(µ0 , Σ) and MN(µ1 , Σ), a new sample Xnew can be

classiﬁed using a linear discriminant function in the following form

aT (Xnew − b).

Derive aT and b using the maximum likelihood discriminant rule. [ 15 marks ]

(b) To simplify the study, the original data were transformed using principal component analysis. The ﬁrst two principal components are used in the analysis, instead of the original

data. The sample mean vectors of the two classes are

0 = ( −0.40, −0.20)T and 1 = (0.75, 0.36)T .

The sample covariance matrices of the two classes are

0 = ┐ and 1 = ┐ .

Using the above information, show that the estimate of Σ in (a) is = ┐ .

[ 5 marks ]

(c) Given the information in (b) and that − 1 = ┐ , calculate the linear dis- criminant function for the diabetes data. Classify the following person with feature vector ( −0.45, 0.15)T to one of the two classes. [ 5 marks ]

9. Eight objects, labeled A, B, C, D, E, F, G and H, have measures of dissimilarity between them assessed as shown below.

	A	B	C	D	E	F	G	H
A	0	57	105	95	100	93	89	51
B	57	0	104	76	92	83	78	37
C	105	104	0	73	99	102	129	121
D	95	76	73	0	40	49	49	57
E	100	92	99	40	0	72	52	74
F	93	83	102	49	72	0	34	60
G	89	78	129	49	52	34	0	56
H	51	37	121	57	74	60	56	0

(a) Demonstrate the complete-link cluster analysis procedure by calculating the matrix showing dissimilarities between clusters in a solution with ﬁve clusters; that is, perform three iterations of the procedure of aggregating clusters. [ 13 marks ]

(b) State the main diﬀerence between the single-link cluster analysis and the complete-link cluster analysis. Illustrate this through the formation of six clusters in a single-link cluster analysis. [ 12 marks ]