Math 123 Problem Set 5
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Math 123
Problem Set 5
Due: Monday 4/26 at 11:59 p.m. EST
Instruction: Read the homework policy. For problem 1, include your code with your inal homework submission. You should submit a pDF copy of the homework and any associated codes on canvas. Your pDF must be a single ile and not multiple images.
1. we consider the clustering of the two-moon data. The following problems should be imple- mented in a programming language of your choice.
(a) Load the data Moon. mat and Moon_Label. mat on canvas (csv iles are also available).
II正i - 正j II2(2)
using the Gaussian Kernel w = exp- 2σ2 , construct the Laplacian. Note that xi refers to the i-th data point.
(b) Describe the connectivity of the graph in the limiting cases (a) as σ 一 钝 and (b) as σ 一 0.
(c) Run K-means clustering on the spectral embedding of the data. use the spectral embedding to be the the 2-nd and 3-rd eigenvectors of the Laplacian corresponding to the two nonzero smallest eigenvalues. Experiment with diferent values of σ . For your optimal result, plot the data and indicate the labels using colors (e.g. points assigned to cluster 1 will be labeled red and points assigned to cluster 2 will be labeled blue).
(d) For your optimal result, plot the spectral embedding of the data.
(e) what does your result in (c) inform you?
Remark: For a given choice of σ, you should run your K-means algorithm with diferent initializations. For this problem, you can use an inbuilt k-means function in MATLAB or python.
2. consider a set of n distinct data points in 此2 {(x1)g1 ))(x2)g2 ))(x3)g3 ))...)(xn)gn )} with n 参 3. we want to it the data using quadratic regression in the least squares sense. The corresponding minimization problem is deined below.
cc(o)2 (gi - [c0 + c1xi + c2xi(2)])2
(a) prove that the above minimization problem is equivalent to the following optimization program
c(mi)n ||g - Ac||2(2))
where c = lc1(c0)」 A 'l 1(1) x(x)2(1) x(x)」' 'l g2(g1)」'
(b) prove that the least squares quadratic regression solution satisies the equation AT Ac = AT g.
(c) Given the equation in (b), a solution for the least squares quadratic regression is c* = (AT A)-1 AT g. what is the necessary and su伍cient condition for AT A to be invertible?
(d) prove that A has full column rank.
[Remark: The n data points are distinct. ]
[Hint: show that Ac = 0 has a trivial solution. You can use the fundamental theorem of algebra.]
(e) Find the solution to the equation in (b) using the singular value decomposition of A.
3. [Extra credit: 20 points] The curious incident of memory to compute Ax . consider the following two ways to compute the product of a matrix A E 此n 根n and a vector x E 此n , b = Ax. The irst computation is column based and is deined as follows:
b = xjaj
where aj denotes the j-th column of A. The second method is row based and is deined as follows:
bj = (aj )T x do this for all 1 < j < n
( . . .(. . .)
where (aj )T denotes the j-th row of A i.e A = '
(. . .
(a1 )T
(a2 )T
(an )T
. . .(. . .))
'
'
. . .)
Remark: In MATLAB, A(:)j) returns the j-th column and A(j):) returns the j-th row.
(a) using a programming language of your choice, implement the above two algorithms.
(b) For a uniform random matrix A and a random vector x, you can now use your algorithms to compute b. For n = 5000, using a random matrix A and a random vector x, compute the product b. Repeat the experiment 30 times and report the average time.
(c) Repeat the experiment in (b) for n = 10000)150000)20000. Do you see a diference in the average time using the column based versus the row based method? If so, explain your results.
Remark: In MATLAB, you can use rand(n,n) to generate a random matrix and rand(n,1) to generate a random vector. You can use the command tic toc to measure the time it takes for the function to compute c. For example, tic; A*b ;toc;computes the time it takes to compute the matrix vector product using the built in function in MATLAB.
2023-08-08