COMP20003 Algorithms and Data Structures @ Semester 2, 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Assignment 2: Autocomplete Lookup
General
You must read fully and carefully the assignment speciication and instructions.
Course: COMP20003 Algorithms and Data Structures @ Semester 2, 2023
· Deadline Submission: Friday 8th September 2023 @ 5 pm (end of Week 7)
· Course Weight: 15%
· Assignment type: individual
· ILOs covered: 1, 2, 3, 4
Submission method: via ED
Purpose
The purpose of this assignment is for you to:
Improve your proiciency in C programming and your dexterity with dynamic memory allocation.
· Demonstrate understanding of a concrete data structure (radix tree) and implement a set of algorithms.
· Practice multi-ile programming and improve your proiciency in using UNIX utilities.
· Practice writing reports which systematically compare the eciency of diferent data structures.
Walkthrough
An error occurred.
Try watching this video on www.youtube.com, or enable JavaScript if it is disabled in your browser.
Introduction
Autocomplete
Autocomplete highly enhances user experience for two reasons: it minimises the number of
keystrokes needed to specify a string input, while at the same time helps users with a suggestion of useful closely related inputs. Autocomplete typically wants to suggest the most likely strings that
share the preix speciied so far. Typically the preix is extended every time a new keyboard event is triggered, hinting that any successful autocomplete function should be as fast as the average human typist. Surveys suggest that the average person types 200 characters per minute, which means that ideally our autocomplete algorithm should not take longer than 0.3 seconds to suggest the correct autocompletion candidates. If autocomplete takes much longer, users will dislike our solution.
Another important aspect is that more than one candidate should be suggested according to their likelihood. If we fail suggesting the correct autocompletions users will not like our implementation either.
Autocomplete can be eciently implemented using a radix tree, which is a concrete data structure
that stores and supports lookup of keys. Note that radix trees are like binary search trees but may
have more complex rules for children (e.g. branching is based on bit-level comparison of the strings, rather than full string comparisons). Radix trees are used as an associative data structure (trie), where each node does not store the full key (stores part of the key). The position of the node in the tree
deines the key associated with the node, where all its equal subtree descendants share the same key preix. A key can have an associated weight to indicate its likelihood.
Autocompletion can be implemented in C using a number of abstract data structures and underlying concrete data structures. Any implementation must support the operations:
insert a new item < key, data > into a tree and
find_and_traverse the tree given a preix, and return (or print) a list of pairs < key, data > where all keys share the same preix.
In getting an understanding of the theory behind the new data structure used in stage 3, you may ind the original paper useful: PATRICIA—Practical Algorithm To Retrieve Information Coded in
Alphanumeric (Morrison, 1968).
Radix Trees
Though there are more compact ways to implement the radix tree data structure (e.g. the table
structure described in the original paper will allow the same functionality, but requires less data to be stored), the approach we recommend is a linked data structure made up of nodes. These nodes will contain:
An integer representing the number of bits which are a common preix to all entries in the chain. If there are no common bits, this will be 0.
A dynamically allocated array of character values, this will contain the entire common preix.
One pointer to a node called branchA , this covers the branching structure when the bit following the common preix is 0.
One pointer to a node called branchB , this covers the branching structure when the bit following the common preix is 1.
One pointer to a list of associated data items.
This particular approach assumes a unique termination character is also part of the string (we will use the character where all bits of the character are 0 ( \0 ) for this character).
With this approach, we keep track of the number of bits we have processed through, adding the preixes of those bits to the key constructed so far.
To check if the key has been completed, after processing the bits in the current node, the number of bits seen can be checked to see if it is a multiple of 8 and the inal character can be checked to see if it contains a \0 (both conditions are necessary).
To treat the searched key as a preix, the terminating character is ignored, with all descendant nodes of the matching preix position traversed and the associated values of all completed keys returned.
Your Task
Your task involves two new stages and a report. The irst new stage is Stage 2, which implements
Autocomplete behaviour using a sorted array as the concrete implementation. Then the second new stage will be Stage 3, which uses a radix tree. After completing the two new stages, you will compare their performance in a report. For the report, you will need to devise and carry out a process which makes use of what you have learned in the subject so far.
For the two new stages, your output to the output ile will be the same format as Assignment 1, but the output to the terminal will change. To support your analysis, this output is recommended to be the number of bits compared, the number of characters compared and the number of whole strings compared.
Implementation Details
Example Radix Tree Insertions
The irst insertion is conceptually very simple. The radix tree is initially empty, so the preix
comprises the entire key. The diagram below shows the data structure, with the letters shown to illustrate what the binary values tell us about the key.
|
Root Node |
120 |
|||||||||||||||||||
|
preixBits |
||||||||||||||||||||
|
preix |
0b01000100 |
01101111 |
01101101 |
01101001 |
01101110 |
01101111 |
00100111 |
01110011 |
00100000 |
01010000 |
01101001 |
01111010 |
01111010 |
01100001 |
00000000 |
|||||
|
D |
o |
m |
i |
|
n |
o |
' |
s |
|
P |
|
i |
|
z |
z |
a |
\0 |
|
||
|
branchA |
NULL |
|
||||||||||||||||||
|
branchB |
NULL |
|||||||||||||||||||
|
data |
... |
Work needs to be done when the second key is added. If we add "Arbory Bar and Eatery" to the radix tree, the two difer in the irst character. D has binary representation 0b01000100, and A has binary representation 0b01000001, so the irst 5 bits (0b01000) are a shared preix.
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Node |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
preixBits |
171 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
preix |
001 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
A |
r b o r y B a r a n d E s t a t e \0 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
branchA |
NULL |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
branchB |
NULL |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
data |
... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Node |
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
preixBits |
115 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
preix |
100 |
01101111 |
01101101 |
01101001 |
01101110 |
01101111 |
00100111 |
01110011 |
00100000 |
01010000 |
01101001 |
01111010 |
01111010 |
01100001 |
00000000 |
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
D |
o |
m |
i |
n |
o |
' |
s |
|
P |
i |
z |
z |
a |
\0 |
|
|||||||||||||||||||||||||||||||||||||||||||
|
branchA |
NULL |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
branchB |
NULL |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
data |
... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
&nb |
2023-09-11
Autocomplete Lookup