ECE250: Lab Project 2

Due Date: Wednesday March 24, 2021, 11:00 pm (EDT)


1. Project Description

In this project you will implement a trie data structure using recursive approach. A trie is a 26- ary tree where the root node represents an empty string “” and if the kth (k going from 0 to 25) subtree is not a null subtree, it represents a string that is the concatenation of the characters represented by the parent and the k th letter of the alphabet (where ‘a’ is the 0th letter, ‘b’ is the 1st letter, and so on). Each node may additionally indicate that it is a terminal entry for a word. While a trie could be used to store hyphenated and capitalized words together with those with apostrophes, in this project we will restrict ourselves to words made up of the 26 (twenty-six) lowercase letters of the English alphabet.

For example, consider the sentence “the fable then faded from my thoughts and memory”. If we wanted to put these nine words into a trie, it would look like Figure 1. Only the root node explicitly shows its 26 subtrees, each subtree being associated with a letter of the alphabet. In this example, 22 of the subtrees of the root node are null, as there are no words that begin with, for example, ‘b’. Four of the children of the root are not null; those representing the letters ‘a’, ‘f’, ‘m’, and ‘t’. For the other nodes, only the non-null subtrees are shown. These subtrees represent words starting with these letters. Terminal nodes are represented as red circles.

Figure 1. A trie containing the words in the sentence “the fable then faded from my thoughts and memory”. Note that edges are labeled with letters only for clarity. The letters are not actually stored in the edges or the nodes. Instead, the position of a node determines the key associated with it. For example, the first child of any node represents ‘a’, the second child represents ‘b’ and so on.

Autocomplete is the most important application of trie data structure especially in the current age of smartphones and smart devices. Autocomplete is a mechanism in which the device software predicts the rest of a word a user is typing based on the string prefix. Autocomplete using trie data structure is implemented in various software applications such as web browsers, search engines, source code editors and many more.

We ask you to implement the trie using a recursive implementation and C++ classes. You can use the C++ string class in the standard library. To access the k th character, use an index (similar to arrays). You are not allowed to use any classes from the STL C++ library in this project.


2. Program Design

Write a short description of your design. You will submit this document along with your C++ solution files for marking and it must include your design decisions. Please follow the Design Document Template given on LEARN under “Contents | Course Materials | Tutorials | Tutorial #1 | Design Document Template” when writing your document.


3. Input and Output Requirements

Write a test program for the trie called trietest.cpp. The test program will read commands from standard input and write the output to standard output. The program will respond to the commands described in this section.

The expected running time of each command is specified in the fourth column of the following table; n indicates the number of characters in a word and N indicates the number of words already in the trie.

Note that for this project you will assume that the input provided will be as described below. Assume that there are no errors in the input, except for the possibility of having invalid characters in the word for commands insert (i), erase (e), and search (s).

Command
Parameters
Description
Expected Running Time
Output
  i   w   Inserts a word.   O(n)
  success: if the insertion command was successful.
  failure: if the word is already in the trie (no duplicates are
  allowed).
  illegal argument: if the word contains any characters
  other than those of the English alphabet in lowercase
  (‘a’ through ‘z’), throw an illegal_argument exception
  and output “illegal argument”.
  e   w   Erases a word.   O(n)
  success: if the word is in the trie and was erased.
  failure: if the word is not in the trie, or the trie is empty.
  illegal argument: if the word contains any characters
  other than those of the English alphabet in lowercase
  (‘a’ through ‘z’), throw an illegal_argument exception
  and output “illegal argument”.
  s   w   Searches for a word.   O(n)
  found w: if the word exists in the trie.
  not found: if the word doesn’t exist in the trie.
  illegal argument: if the word contains any characters
  other than those of the English alphabet in lowercase
  (‘a’ through ‘z’), throw an illegal_argument exception
  and output “illegal argument”.
  print
  Prints all the words in the
  trie in alphabetical
  order (depth first traversal
  of the trie).
  O(N)
  word1 word2 word3 ...
  There should be no trailing blanks after the last word.
  No output is created if the trie is empty.
  autocomplete   prefix*
  Prints list of all keys with a
  given prefix in alphabetical
  order (depth first traversal
  of the trie).
  For example, the command
  autocomplete
  fox* will print the
  list of all words in the
  trie starting with “fox”,
  like fox, foxes,
  foxlike, etc. in
  alphabetical order.
  O(N)
  word1 word2 word3 ...
  Assume that prefix will contain characters only from
  the English alphabet in lowercase (‘a’ through ‘z’).
  There should be no trailing blanks after the last word.
  No output is created if the trie is empty or if the prefix is not
  in the trie.
  empty
  Checks if the trie is empty.   O(1)
  empty 1: if trie is empty.
  empty 0: if trie is not empty.
  clear
  Deletes all the words from
  the trie.
  O(N)
  success
  If the trie is already empty it prints success as well.
  size
  Returns the number of
  words in the trie.
  O(1)
  number of words is count: if the trie is not empty.
  number of words is 0: if the trie is empty.
  exit
  Last command for
  all input files.

  This command does not print any output.


In your design document, you should describe how you have achieved the expected runtimes in your implementation.

● Exception Handling

For commands “i”, “e”, and “s”, we ask that you handle invalid characters in the word using exception handling. In order to implement this, you will need to:

(a) Define a class for this exception, call it illegal_exception

(b) Throw an instance of the class when the condition is encountered with this line: throw illegal_exception();

(c) Use a try/catch block to handle this exception and print the desired output

● Test Files

You will find on LEARN example input files with the corresponding output files. The files are named test01.in, test02.in and so on with the output files named test01.out, test02.out and so on.

When you create input files, be sure to follow the example files. Note that each line ends with a UNIX new line character, and that there are no blank spaces before this character. Each input file ends with the line “exit” – followed by a new line character.

Be aware that text files created on DOS/Windows machines have different line endings than files created on UNIX/Linux.

● Checking for Memory Leaks

When testing your program you need to ensure that all memory allocated has been freed adequately. You can use the valgrind utility available in the ECE UNIX server (eceUbuntu) to ensure that your program is handling this correctly.

In this project, you can use the following command on eceUbuntu, to identify memory leaks in your executable triedriver:

valgrind --leak-check=yes ./triedriver < test01.in

Memory leaks are usually associated with deficiencies in the destructor function of a class.


4. How to Submit Your Program

Once you have completed your solution and tested it comprehensively on your computer or on the lab computers, you have to transfer your files to eceUbuntu and test there since we perform the automated testing using this environment.

Once you finish testing on eceUbuntu, you will create a compressed file (tar.gz) that should contain:

● A document (maximum two pages) describing your design. A document beyond two pages will not be marked. This document must be typed for marking and should be submitted in PDF with the name xxxxxxxx_design_pn.pdf, where xxxxxxxx is your UW user id (e.g., jsmith) and n is the project number that is 2 (two) for this submission.

● A test program called trietest.cpp that reads the commands and writes the output.

● Required header files and classes (ending in .h, .hpp. or cpp)

● A “make” utility file called “Makefile”, with instructions on how to compile your solution and create an executable file named triedriver

The name of your submission file should be xxxxxxxx_pn.tar.gz, where xxxxxxxx is your UW user ID (e.g., jsmith) and n is the project number that is 2 (two) for this submission. We recommend that you create this tar.gz file on eceUbuntu.