Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SAMPLE EXAMINATION

Semester 1 - SAMPLE, 2023

COMP5046 / 4446 Natural Language Processing

1.  What is an advantage of FastText over Word2Vec? Give an example that demonstrates this advantage. [2pt]

2.   Describe two metrics that we can use to evaluate word vectors. Indicate whether

each one is intrinsic or extrinsic. [2pt]

3.   Consider an RNN based model for the tasks below. For each task, indicate (a)  how many outputs will be produced, (b) what the set of possible values for each output are, and (c) what input is provided at each step in the RNN.

Task A: Named Entity Recognition [3pt]

Task B: Sentiment Analysis [3pt]

4.   Describe the difference between using every token in your vocabulary and using BPE to define your vocabulary. Why is one better than the other for language modelling? [3pt]

5.  A Hidden Markov Model includes states, observations, transition probabilities, observation likelihoods. Describe what each one of these would correspond to when using an HMM for POS tagging. [4pt]

6.  The IOB format categorizes tagged tokens as I, O and B. Why are three tags necessary and what problem would be caused if we used I and O  tags exclusively? Give an example to support your argument. [2pt]

7.  Describe the main intuition behind attention in a neural network model. [1pt]

8.  Why does the transformer have a positional encoding? [1pt]

9.  What are the 3 possible actions in a transition-based parser, and what do they do? [6pt]

10. What is the purpose of backpropagation in model training? [1pt]

11. When annotating data, what is the purpose of adjudication? [1pt]

12. (a) Given the following values from an evaluation of a model, calculate precision, recall, and F-score:

True-Positive: 10

False-Positive: 90

True-Negative: 100

False-Negative: 1

Show all working / your calculations. Full credit will not be given without the steps required for the calculation. [3pt]

(b) If this model was for spam email detection, is its performance suitable to be used? Why / why not? [1pt]