Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 2: Autocomplete Lookup

General

You must read fully and carefully the assignment speciication and instructions.

Course: COMP20003 Algorithms and Data Structures @ Semester 2, 2023

· Deadline Submission: Friday 8th September 2023 @ 5 pm (end of Week 7)

· Course Weight: 15%

· Assignment type: individual

· ILOs covered: 1, 2, 3, 4

Submission method: via ED

Purpose

The purpose of this assignment is for you to:

Improve your proiciency in C programming and your dexterity with dynamic memory allocation.

· Demonstrate understanding of a concrete data structure (radix tree) and implement a set of algorithms.

· Practice multi-ile programming and improve your proiciency in using UNIX utilities.

· Practice writing reports which systematically compare the eciency of diferent data structures.

Walkthrough

An error occurred.

Try watching this video on www.youtube.com, or enable JavaScript if it is disabled in your browser.

Introduction

Autocomplete

Autocomplete highly enhances user experience for two reasons: it minimises the number of

keystrokes needed to specify a string input, while at the same time helps users with a suggestion of useful closely related inputs. Autocomplete typically wants to suggest the most likely strings that

share the preix speciied so far. Typically the preix is extended every time a new keyboard event is triggered, hinting that any successful autocomplete function should be as fast as the average human typist. Surveys suggest that the average person types 200 characters per minute, which means that   ideally our autocomplete algorithm should not take longer than 0.3 seconds to suggest the correct    autocompletion candidates. If autocomplete takes much longer, users will dislike our solution.

Another important aspect is that more than one candidate should be suggested according to their  likelihood. If we fail suggesting the correct autocompletions users will not like our implementation either.

Autocomplete can be eciently implemented using a radix tree, which is a concrete data structure

that stores and supports lookup of keys. Note that radix trees are like binary search trees but may

have more complex rules for children (e.g. branching is based on bit-level comparison of the strings,  rather than full string comparisons). Radix trees are used as an associative data structure (trie), where each node does not store the full key (stores part of the key). The position of the node in the tree

deines the key associated with the node, where all its equal subtree descendants share the same key preix. A key can have an associated weight to indicate its likelihood.

Autocompletion can be implemented in C using a number of abstract data structures and underlying concrete data structures. Any implementation must support the operations:

insert a new item < key, data > into a tree and

find_and_traverse the tree given a preix, and return (or print) a list of pairs < key, data > where all keys share the same preix.

In getting an understanding of the theory behind the new data structure used in stage 3, you may ind the original paper useful: PATRICIA—Practical Algorithm To Retrieve Information Coded in

Alphanumeric (Morrison, 1968).

Radix Trees

Though there are more compact ways to implement the radix tree data structure (e.g. the table

structure described in the original paper will allow the same functionality, but requires less data to be stored), the approach we recommend is a linked data structure made up of nodes. These nodes will    contain:

An  integer  representing the number of bits which are a common preix to all entries in the chain. If there are no common bits, this will be 0.

A dynamically allocated array of  character  values, this will contain the entire common preix.

One pointer to a node called  branchA , this covers the branching structure when the bit following the common preix is 0.

One pointer to a node called  branchB , this covers the branching structure when the bit following the common preix is 1.

One pointer to a list of associated  data  items.

This particular approach assumes a unique termination character is also part of the string (we will use the character where all bits of the character are 0 ( \0 ) for this character).

With this approach, we keep track of the number of bits we have processed through, adding the preixes of those bits to the key constructed so far.

To check if the key has been completed, after processing the bits in the current node, the number of   bits seen can be checked to see if it is a multiple of 8 and the inal character can be checked to see if it contains a  \0 (both conditions are necessary).

To treat the searched key as a preix, the terminating character is ignored, with all descendant nodes of the matching preix position traversed and the associated values of all completed keys returned.

Your Task

Your task involves two new stages and a report. The irst new stage is Stage 2, which implements

Autocomplete behaviour using a sorted array as the concrete implementation. Then the second new  stage will be Stage 3, which uses a radix tree. After completing the two new stages, you will compare their performance in a report. For the report, you will need to devise and carry out a process which   makes use of what you have learned in the subject so far.

For the two new stages, your output to the output ile will be the same format as Assignment 1, but  the output to the terminal will change. To support your analysis, this output is recommended to be   the number of bits compared, the number of characters compared and the number of whole strings compared.

Implementation Details

Example Radix Tree Insertions

The irst insertion is conceptually very simple. The radix tree is initially empty, so the preix

comprises the entire key. The diagram below shows the data structure, with the letters shown to illustrate what the binary values tell us about the key.

Root Node

120

preixBits

preix

0b01000100

01101111

01101101

01101001

01101110

01101111

00100111

01110011

00100000

01010000

01101001

01111010

01111010

01100001

00000000

D

o

m

i

n

o

'

s

P

i

z

z

a

\0

branchA

NULL

branchB

NULL

data

...

Work needs to be done when the second key is added. If we add "Arbory Bar and Eatery" to the radix tree, the two difer in the irst character. D has binary representation 0b01000100, and A has binary  representation 0b01000001, so the irst 5 bits (0b01000) are a shared preix.

01110010

01100010

01101111

01110010

01111001

00100000

01000010

01100001

01110010

00100000

01100001

01101110

01100100

00100000

01000101

01100001

01110100

01100101

01110010

01111001

00000000

Node

preixBits

171

preix

001

A

r                  b o                  r y B                 a                   r a                  n                  d E                  s                  t                   a                  t                  e                  \0

branchA

NULL

branchB

NULL

data

...

Node

preixBits

115

preix

100

01101111

01101101

01101001

01101110

01101111

00100111

01110011

00100000

01010000

01101001

01111010

01111010

01100001

00000000

D

o

m

i

n

o

'

s

P

i

z

z

a

\0

branchA

NULL

branchB

NULL

data

...

&nb