闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP90042

Natural Language Processing

Semester 1 2020

Section A: Short Answer Questions [13 marks]

Answer each of the questions in this section as brieﬂy as possible. Expect to answer each sub-question in no more than a line or two.

Question 1: General Concepts [7 marks]

a) “Sparsity” is a key problem in text processing; explain what is meant by “sparsity” and outline an important impact this has on an “N-gram language model” . [2 marks]

b) Contrast the evaluation metric “BLEU” and “ROUGE” . Identify one similarity and one key diﬀerence. [2 marks]

c) Explain what is meant by the “contextual representation” of a word? Why might they be more useful than Word2Vec embeddings? [2 marks]

d) A copy mechanism is introduced to encoder-decoder models for abstractive summarisation. Would a copy mechanism help encoder-decoder models for translation? Explain. [1 mark]

Question 2: Formal Language Theory [3 marks]

a) “Regular languages” are closed under intersection. Explain what this means, and why this is important for language processing. [2 marks]

b) Describe how “ﬁnite state transducers” diﬀer from “ﬁnite state automata” . [1 mark]

Question 3: Topic Models [3 marks]

a) Name two hyper-parameters in “Latent Dirichlet Allocation”, and describe their roles in how they inﬂuence the model in terms of topic quality and output distribution. [2 marks]

b) Explain why it is diﬃcult to evaluate topic models, and name one method for evaluation of topic models. [1 mark]

Section B: Method Questions [15 marks]

In this section you are asked to demonstrate your conceptual understanding of the methods that we have studied in this subject.

Question 4: Lexical semantics [5 marks]

a) Based on the following WordNet-style graph of hypernymy:

entity

artifact

tool

fork1

part

handle blade

where artifact means “a man-made object”, and fork1 means “an agricultural tool used for lifting or digging” . Insert the following lemmas: fork2 , knife, cutlery (in their kitchenware senses), and spade. You are free to alter the structure of the graph to accomodate the new lemmas. Add distinguished edges (e.g., dashed) for at least two meronym relations. [2 marks]

b) Describe the “word2vec skip-gram” and “Latent Semantic Analysis” methods for learning vector rep- resentations of words. Compare and contrast these methods, considering their underlying modelling intuitions, data requirements, and algorithms for implementation. [3 marks]

Question 5: Information Extraction [4 marks]

Consider the following document, composed of only one sentence, its corresponding Named Entity anno- tation and a gold set of relations extracted from it:

● Hugh Jackman is an actor born in 1968 in Sydney, NSW, Australia .

[Hugh Jackman]PER is an actor born in [1968]TIME in [Sydney]LOC , [NSW]LOC , [Australia]LOC .

● Gold relations:

– year-of-birth(Hugh Jackman, 1968)

– place-of-birth(Hugh Jackman, Sydney)

– city-state(Sydney, New South Wales)

– state-country(New South Wales, Australia)

a) Suppose you want to train a Named Entity Recogniser using an Hidden Markov Model. Rewrite the named entity annotated sentence into a sequence of (word, tag) elements using one of the schemes you learned in class. Write your answer in the following format: word1/tag1 word2/tag2 . . . [2 marks]

b) The ﬁrst step in Relation Extraction is to build a binary classiﬁer that recognises if two entities have a relation or not. Assuming the example above is the only data you have available, how many positive and how many negative instances you would have in your training set for this classiﬁer? [1 mark]

c) The second step in Relation Extraction is to build a multi-class classiﬁer that, given a positive entity pair, predicts the relation between them. However, even if you have a perfect classiﬁer the relations extracted from the sentence will not match the gold relations given above. Why is this the case? How would you solve this problem, so the relations match the gold standard? [1 mark]

Question 6: Dependency Grammar [6 marks]

a) Describe what it means for two words to be in a “dependency” relation, and provide an example. [1 mark]

b) Show the dependency parse for the sentence

And I would eat them in a boat

You do not need to provide edge labels. [2 marks]

c) Show a sequence of parsing steps using a “transition-based parser” that will produce this dependency parse. Be sure to include the state of the stack and buﬀer at every step. [3 marks]

Section C: Algorithmic Questions [12 marks]

In this section you are asked to demonstrate your understanding of the methods that we have studied in this subject, in being able to perform algorithmic calculations.

Question 7: N-gram language models [6 marks]

This question asks you to calculate the probabilities for N-gram language models. You should leave your answers as fractions. Consider the following corpus, where each line is a sentence:

natural language natural language

processing

understanding

natural language applications in the wild

a) Calculate a bigram language model over this data, using add 1 smoothing. Add additional symbols as needed. Hint: you should start by considering each context word, one at a time. [3 marks]

b) Compute the probability of the sentence “language understanding applications” under your bigram language model. [1 mark]

c) The “Kneser-Ney” method for language modelling diﬀers in several ways to the language model used above. Explain two such diﬀerences, making reference to the above corpus, where appropriate, to support your answer. [2 marks]

Question 8: Context-Free Grammar & Parsing [6 marks]

This question is about using analyzing syntax. Consider the following ambiguous sentence:

Find the boy with an eye

a) Describe the syntactic ambiguity in this sentence [1 mark]

b) Write a set of linguistically-plausible CFG productions that can represent and structurally diﬀerentiate the two interpretations. [2 marks]

c) Perform CYK parsing of the sentence using your grammar. You should include the full chart described in the lecture, which will include the edges for both possible interpretations. You should label the order you ﬁll in the cells. [3 marks]