闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMPSCI 752

Big Data Management

Assignment 3 / Semester 1, 2023

Knowledge Graph and Fundamentals Problems

1 Knowledge graph [25 marks]

We will build a knowledge graph based on the text from Ninh’s homepage.

"Prior to joining University of Auckland in December 2018, Ninh worked in

Copenhagen for 7 years at the University of Copenhagen and IT University of Copenhagen . He received his PhD at IT University of Copenhagen under the supervision of Professor Rasmus Pagh in 2014 . After that, he spent 4 years in postdoctoral positions in Copenhagen . He was the recipient of the best paper awards in WWW Conference 2014 and PKDD 2020 . AMiner has recognized him as the 2022 AI 2000 Most Influential Scholar Honorable Mention in Data Mining (Rising Star) for his outstanding and vibrant contributions to this field between 2012 and 2021 . "

1. Unsupervised method: Assume that nouns will be entities, and verbs form relations. Using NLP techniques (e.g. nltk package), write a small Python script to parse the above text into entities and relationships. [7 marks]

Construct a knowledge graph based on the parsing result. [3 marks]

2. Supervised method: Using a pre-trained model (e.g. https://spacy .io/models/ en) to parse the above text for entities and verbs. [7 marks]

Assume that verbs form relations, construct a knowledge graph based on the pars- ing result. [3 marks]

3. If we use some speciﬁc nouns as verbs, e.g. supervision, award, contribution, how do the constructed knowledge graphs above change? [5 marks]

Though handwriting graphs are acceptable (without any mark deduction), you are en- couraged to ﬁnd suitable tools to draw knowledge graphs.

Solutions:

See the Python code attached.

Note that the goal of this question is to motivate students on using public Python libraries for NLP tasks (nltk or spacy). Student’s solutions might be diﬀerent de- pending on their parsing results. It is ﬁne given the knowledge graph has most of basic information.

2 Odd sketch [15 marks]

Given two set A and B, the symmetric diﬀerence size between A and B is lAl + lBl - 2lA n Bl. Consider 4 sets below:

S1 = {1, 2, 4, 6, 8} ,

S2 = {1, 2, 4, 6, 9} ,

S3 = {1, 2, 4, 7, 9} ,

S4 = {1, 3, 5, 7, 9} .

1. Finding all pairs of sets such that their symmetric diﬀerence size is at most 2. [3 marks]

Solution: Simply compute all pairs of similarity using the set diﬀerence size, we have

sym(S1, S2) = 2 sym(S2, S3) = 2 sym(S3, S4) = 4

sym(S1, S3) = 4

sym(S2, S4) = 6

sym(S1, S4) = 8

We have the similar pairs {(S1, S2), (S2, S3)}.

2. Using the hash function h(x) = x + 1 mod 4, construct odd sketches of 4 bits length for these sets above. [8 marks]

Solution: The odd sketches of the sets are

S1 = {0, 0, 1, 0}

S2 = {0, 1, 0, 0}

S3 = {1, 1, 0, 1}

S4 = {0, 0, 1, 0}

3. How can we use these odd sketches to solve the question 1? Explain your solution. [4 marks]

Solution: We observe that if the set diﬀerence size of two sets is at mОst 2, the odd sketches of these two sets will be diﬀerent at mОst 2 positions.

For example, considering two sets A1 = {1, 2, 4, 6}, A2 = {1, 2, 4, 6}, there odd sketches are identical. Now consider S1 = A1 n {8}, S2 = A2 n {9}. Since sym(S1, S2) = 2, there are mauimum two positions that their odd sketches are diﬀerent (i.e. when 8 and 9 hash into diﬀerent positions).

Hence we can use odd sketches to answer the question 1 by considering all pairs whose odd sketches are diﬀerent at most 2 positions. Using their odd sketches, we have the result {(S1, S2), (S1, S4), (S2, S3), (S2, S4)}.

There might be the false positives introduced by Odd Sketches, e.g. (S1, S4). It should be ﬁne given the eﬃciency of binary operations. You might ﬁlter out the false positives by computing the symmetric diﬀerence of pairs returned by Odd Sketches or increasing the size of Odd Sketches.

3 PageRank [10 marks]

We ﬁrst use Graph 1 in Figure 1 for executing the PageRank algorithm with the damping factor d = 1.

Figure 1: Graph 1

1. Compute the transition matrix M for Graph 1. [1 mark]

2. Compute the PageRank of each vertex until it converges. [4 marks]

3. We remove the edge from E to A and C to E in Graph 1 to form Graph 2 in Figure 2. Assume the sum of values of the initialized page rank vector uО is 1. We run PageRank on Graph 2 until it converges. How can we initialize uО such that, in the end, node E has

(a) the largest page rank.

(b) the smallest page rank.

Explain the result. [5 marks]

Figure 2: Graph 2

Solution:

z 0 0 0 0 1r

'(')1/3 0 0 1/2 0'(')

1. M = '(')1/3 0 0 1/2 0 '(')

' '

' 0 0 1 0 0'

2. We compute u = MuО until it converges to {0.1875, 0.1875, 0.1875, 0.2500, 0.1875}.

3. We construct new transition matrix.

z 0 0 0 0 0r

'(')1/3 0 0 1/2 0'(')

M\ = '(')1/3 0 0 1/2 0'(') .

' '

' 0 0 0 0 0'

Whatever we initialize the vector uО , there are always lost of contributions due to the ﬁrst zero-row and last zero-row. In the end, all nodes have the same page rank, which is 0.

2023-06-08

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言