闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

AMA546 Statistical Data Mining

Assignment 1

2022

Please submit your homework to the blackboard system. You may submit your solutions several times to correct some mistakes, but please make sure that each submission is a whole submission. For example, if you found a mistake in your solutions to Q1 and plan to correct it, then please submit the corrected solution to Q1, and also the solutions to Q2 through Q7 all together. Please start early, and we do not accept late submissions.

Please submit the pdf ﬁles to the Blackboard system. The markers of the assignments are research students from department AMA. Please show your courtesy to them by is not diﬃcult, like

. . .therefore the desired probability is

boxing

P (A) = 0.334

In all the computations, if you need to round a number, please keep four decimal places, like

= 142.8571, = 0.0029.

Full marks: 100. For all the problems, please give detailed steps to get full marks.

1. Let λ > 0. Deﬁne

x* (λ) = arg min R(x │ 1)2 + λ|x|、.

That is, x* (λ) is the number such that for any x è R, (x* (λ) │ 1)2 + λ|x* (λ)| × (x │ 1)2 + λ|x|. (a) (5 marks) Prove that for x è R, if x x* (λ), then (x* (λ) │ 1)2 + λ|x* (λ)| < (x │ 1)2 + λ|x|. (b) (5 marks) Find x* (λ). Find all the λ such that x* (λ) = 0.

2. (10 marks) Show that 冂(冂)βˆridge 冂(冂) increases as its tuning parameter λ → 0. Does the same property hold

for the Lasso estimator?

3. Consider the following training data.

Name	Lay Eggs	Can Fly	Have Legs	Class
monkey	no	no	yes	mammals
python	yes	no	no	non-mammals
salmon	yes	no	no	non-mammals
whale	no	no	no	mammals
frog	yes	no	yes	non-mammals
lizard	yes	no	yes	non-mammals
bat	no	yes	yes	mammals
pigeon	yes	yes	yes	non-mammals
cat	no	no	yes	mammals
leopard shark	no	no	no	non-mammals
turtle	yes	no	yes	non-mammals
penguin	yes	no	yes	non-mammals
owl	yes	yes	yes	non-mammals
dolphin	no	no	no	mammals
eagle	yes	yes	yes	non-mammals
dog	no	no	yes	mammals

a. Use the Gini index to determine which attribute, among the three: Lay Eggs, Can Fly, and Have Legs, should one use ﬁrst to build a decision tree. (5 marks)

b. Use the Gini index to determine which attributes should one use in the second step. (5 marks)

c. Build a naive Bayes classiﬁer with Laplace smoothing (set the smoothing parameter α = 1). In particular, let (l, c, h) be the attribute values of a new kind of animal. Let l, c, h è $│1, 1}, where

1 stands for “yes” and │1 stands for “no” . Find the constant α0 and the functions g1 , g2 and g3 of the generalized additive model form of naive Bayes classiﬁer,

g(l, c, h) = α0 + g1 (l) + g2 (c) + g3 (h),

so that when g(l, c, h) > 0, we can infer that the new animal is mammal, and when g(l, c, h) < 0, the inferred output is non-mammal. (10 marks)

4. (10 marks) Consider the regularization problem min Rly │ Zβl2 + λΩ(β) 、with penalty Ω(一) = l 一 l2 for ridge regression or Ω(一) = l 一 l1 for lasso. Below we have two plots (Figure 1, and Figure 2) of the contours of some f(β) = ly │ Zβl2 . For Figure 1, we attach the penalty lβl2 contours. Please ﬁrst draw the contours of the penalty lβl1 on Figure 2, then draw for both plots the minimization paths, and mark/sketch on the paths which points correspond to λ = 0, and which points correspond to the limit of λ → &. You may either directly draw your solutions onto the page of pdf ﬁle, or print the pages, draw on the paper, and scan (or take photo of) your solutions.

5. Kernel method.

(a) (5 marks) Prove that K(x, y) = x3 y3 + xy deﬁned on R ．R is a positive semi-deﬁnite kernel.

(b) (5 marks) Prove that G(x, y) =〈x, y〉3 +〈x, y〉deﬁned on R10 ．R10 is a positive semi-deﬁnite

kernel.

(d) (5 marks) Let K : R ．R → R be deﬁned by

K(x, u) = cos(x │ u).

Is K a positive kernel? Please prove your conclusion. For your reference,

ei北 + e_i北

6. The following code is made to draw independent random vectors from a two-dimensional normal distribution N(µ, Σ), with

µ = ╱ 3(│)2 、, and Σ = 、.

Suppose one has a sample x1 , . . . , xn è R2 drawn independently from N(µ, Σ). For each 1 × i × n, denote

xi = ╱ x(x) 、.

One may use the following estimators to verify the code

= xi , and = (xi │ )(xi │ )T .

(1)

It is expected that for large n, ≈ µ and ≈ Σ .

import numpy as np

np.random.seed(123)

def genData(n):

mu = np.array([-2.0, 3.0])

A = np.array([[1.0, 2.0], [3.0, 4.0]])

X = np.random.normal(size = n * 2)

X = X.reshape((2, n))

X = (X + mu).dot(A)

return(X.T)

(a) (10 marks) The current version of the function genData does not work properly. Please ﬁnd out

the problem and ﬁx it (you may input the function into Python3, run it, and read the error message for a hint, but also keep in mind that there might be a problem in the code even if it runs smoothly). Please minimize your change of the code, so as to minimize the workload of grading. Please do not change the output format: the function genData outputs a matrix X of dimension 2．n, with

X [j │ 1, i │ 1] = xi(j) , 1 × j × 2, 1 × i × n.

Please describe how you ﬁx the problem (e.g. “change the code line . . . to . . . ”).

(b) (5 marks) After ﬁxing the problem of genData, please run it with n = 10, and copy x1 and x2

to your solutions. Please set the random seed 123 before running genData so as to be consistent with the model solutions.

(1) to compute and . Please copy and to your solutions.

(d) (10 marks) Set the random seed 123 and generate 1000 data points $xi } with the function

genData. Then, make the 1000 ．1000 Gram matrix K with the kernel

K(x, u) = exp(│lx │ ul2 ).

Find the largest three eigenvalues of K. Here we use the Euclidean norm lξl :=^ξ1(2) + ξ2(2) for ξ = (ξ1 , ξ2 )T è R2 . Instead of using numpy .linalg .eig, you may use the function numpy .linalg .eigh for computing eigenvalues.

2022-10-22

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言