Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSE 374 23WI Homework 5

Due: Tues, Feb 28, 2023, at 11:59 pm

Assignment goals

You will develop a more complex program using data structures. You will:

Develop a larger system one part at a time, testing as you go.

Learn about the trie data structure, a version of a search tree.

Work with trees, structs, and dynamically allocated data in C.

Read and process text files in C.

Practice writing simple Makefiles.

Update

You are encouraged to use the starter files here ( materials/-/tree/master/hw5) for this assignment.

Synopsis

You will build programs to implement T9 predictive text, a text input mode developed originally for cell flip phones and still used for numeric keypads. Each number from 2-9 on the keypad represent three or four letters, the number 0 represents a space, and 1 represents a set of symbols such as { , . ! ? } etc. The numbers from 2-9 represent letters as follows:

2 ABC

3 DEF

4 GHI

5 JKL

6 MNO

7 PQRS

8 TUV

9 WXYZ

Since multiple letters map to a single number, many key sequences represent multiple words. For example, the input 2665 represents "book" and "cool", among other possibilities.

To translate from number sequences to words, we will use a data structure known as a trie. A trie is a    multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in  the trie, we reach word sequences spelled out by the numbers along that path. Classic trie data           structures have edges labeled with letters to store prefixes of strings. But for this application, we used a compressed trie that has only 10 possible branches at each node instead of 26, since the digits 0-9    represent the 26 letters, space and symbols. Because of this, an extra layer of complexity is needed to figure out the string represented by a path.

(Actually, for our application, each node only needs 8 possible children numbered 2-9, since digits 0    and 1 don't encode letters. But writing the code might be easier if nodes have 10 children numbered 0- 9, since then subtree number n corresponds to digit n. Feel free to use either representation for the trie depending on which seems simpler to implement.)

For more information on the trie data structure, here (http://en.wikipedia.org/wiki/Trie) is a link to the Wikipedia article.

Technical Requirements

Implement in C a program t9. The command

./t9 F工LE

should read in a dictionary file (F工LE) that contains a list of words. Translate each word in the             dictionary into its numeric key sequence, then add the key sequence to your trie, with the word at the  end of the path corresponding to the digits. If a word with the same numeric sequence already exists in the trie, add the new word to the trie as a link to a new node with an edge labeled '#' instead of one of  the digits 2-9. (The words linked from a node by the '#' edges essentially form a "linked list" of words    that have the same numeric code, but to simplify the implementation we'll use additional tree nodes to link together words with duplicate codes instead of defining another kind of linked-list node to deal     with this situation.)

For example, if the program reads the set of words "jello","rocks", and "socks" from the dictionary and adds it to an empty trie, the resulting trie should look like this:

Once your program has read the dictionary and built the trie containing the words in it, it should start  an interactive session. The user should be able to type numbers and the program should print out the  word corresponding to the sequence of decimal digits in the numbers. Your program should use the    numbers typed by the user to traverse the trie that has already been created, retrieve the word, and     print it to the screen. If the user then enters '#', the program should print the next word in the trie that  has the same numeric value, and so forth. The user can also type a number followed by one or more '#' characters - this should print the same word that would be found by typing the number and individual '#' characters on separate input lines. If the user enters '#' and there are no more words in the trie that have the same numeric digit sequence, then the program should print "There are no more T9onyms". If no word in the trie matches the input number, the program should print "Not found in current            dictionary".

As an example, if we run the program using the above trie, an interactive session might look like this:

Enter "exit" to quit.

Enter Key Sequence (or "#" for next word):

> 76257

'rocks'

Enter Key Sequence (or "#" for next word):

> #

'socks'

Enter Key Sequence (or "#" for next word):

> 53556

'jello'

Enter Key Sequence (or "#" for next word):

> #

There are no more T9onyms

 

Enter Key Sequence (or "#" for next word):

> 76257#

'socks'

Enter Key Sequence (or "#" for next word):

> 76257##

There are no more T9onyms

>4423

Not found in current dictionary

>exit

The interactive session should terminate either when the user enters the word "exit" or when the end- of-file is reached on the interactive input (indicated by typing control-D on the keyboard on a separate line by itself).

Note: Be sure your program properly handles the case if the user types more "#"s than there are T9onyms for a particular number.

We provide you with two text files, smallDictionary.txt

(T9files/smallDictionary.txt) and dictionary.txt

(T9files/dictionary.txt). Each of these text files contains a list of words to be used in          constructing a trie - the small one primarily for testing, and the large one for the final program.           Translate each word in the file into its associated T9 digit key sequence, and add the word to the trie. In the case of multiple words having the same key sequence k, the first word encountered in the text file  with that key sequence is represented by the key sequence k, the next word from the input file with the same key sequence is represented by k#, the next by k##, etc. For example, 2273 can represent acre,    bard, bare, base, cape, card, care, or case. To disambiguate, acre would be represented by 2273, bard   by 2273#, bare by 2273##, and so forth. When a user inputs a key sequence, print the appropriate word.

Your trie data structure should contain nodes to represent the tree, and C strings (\0-terminated char  arrays) containing copies of the words read from the dictionary file, linked to appropriate nodes in the trie.

Besides the general specification given above, your program should meet the following requirements to receive full credit.

1. You should create a Makefile and use maketo compile your program. Your Makefileshould only recompile the necessary part(s) of the program aler changes are made. Your Makefile   should include a cleantarget so that make cleanwill remove the t9 executable file, all .o  compiled files, and all files with names ending in ~ (tilde) - i.e., backup files created by text editors and similar programs.

2. Use mallocto dynamically allocate the nodes, strings, and any other data that make up your trie.

3. If you need to create a copy of a string or other variable-size data, you should dynamically allocate an appropriate amount of storage using malloc and return the storage with freewhen you are done with it. The amount allocated should be based on the actual size needed, not some arbitrary size that is assumed to be "large enough".

4. Use standard C library functions where possible; do not reimplement operations available in the standard libraries.

5. You must check the return status (result code) of every library function you call to be sure that no  errors occurred. In particular, mallocwill return NULL if it is unable to allocate storage. Although this is extremely unlikely to happen, a robust program must check for the possibility and react     appropriately if it does.

6. If an error occurs when opening or reading a file, the program should write an appropriate error message to stderr and terminate if there is no further work to be done.

7. Before the program terminates, all dynamically allocated data must be properly freed (i.e., free everything acquired with malloc). This should be done explicitly without relying on the operating system to clean up aler the program finishes.

8. Your code must compile and run without errors or warnings when your Makefile compiles your code with gcc -Wall -g -std=c17 on cancun>. Your program should build without errors when make is used to run your Makefile.

9. Your program should terminate cleanly with no memory leaks or other memory errors reported  when it is run using valgrind. (Warning: valgrindslows down execution considerably. It will take several minutes to load the full dictionary.txt file and then free the resulting tree under        valgrind. We suggest you use smaller input files during development to test for memory        problems with valgrind.) If memory leaks or errors are detected, valgrind's --leak-   check=full option will be useful to generate more extensive messages with information about

the memory problems.

Code Quality Requirements

For full credit you must:

1. Divide your program into suitable source files (at least three) and functions, each of which does a single well-defined aspect of the assignment. For example, there should almost certainly be a    header and source file for the trie data structure and the operations needed on it (create a new   empty trie, insert a word, search, delete the trie, etc.), and an additional file that contains the the main part of the program that uses the trie data structure. Your program most definitely may not consist of one huge main function that does everything.

2. The header ( .h) file for the trie (and any other header files) should only declare items that are       shared between client programs that use the header and the file(s) that implement it. Don't          include in the header file implementation details that should be hidden, including helper functions that are not meant to be used by other files. Be sure to use the standard #ifndef preprocessor    trick.

3. Include appropriate function prototype declarations near the beginning of each source file for local helper functions defined in that file whose declarations are not included in a header file for use by other files. The function declaration should be accompanied by appropriate heading comments    documenting the function.

4. Comment sensibly, but not excessively. You should not use comments to repeat the obvious or      explain how the C language works - assume that the reader knows C at least as well as you do. Your code should, however, include the following minimum comments:

  Every function must include a heading comment that explains what the function does (not

how it does it), including the significance of all parameters. It must not be necessary to read   the function code to determine how to call it or what happens when it is called. This comment should appear where the function is declared (in a header file or at the top of a .cfile for local functions). The comment should not be repeated later where the function is defined             (implemented).

No global variables. Use parameters (particularly pointers) appropriately. Exception: It may be appropriate to use global variables for constant data like translation tables if the program is   better structured this way.

No unnecessary computation or excessive use of malloc or free - these are relatively

expensive. Don't make unnecessary copies of large data structures; use pointers. (Copies of ints, pointers, and similar things are cheap; copies of large arrays and structs are expensive.)

As with homework 3, we will be running a linter to check your files for certain style elements. We will be running the script (./hw5Linter) which calls clint.py (clint.py) and then gets rid of certain output          messages that we will not penalize, such as the "No copyright" warning and using tabs versus spaces.  You can run the hw5Linter on your files one at a time to do the same checks that we will do on your     assignment. If you have questions about warnings you are seeing when you run hw5Linter, you can     make a post on edstem and we will take a look.

Implementation Hints

1. There are a lot of things to get right here; the job may seem overwhelming if you try to do it all at  once. But if you break it into small tasks, each one of which can be done individually by itself, it    should be quite manageable. For instance, figure out how to add a single word to the trie before  you implement the logic to process all of the words in the dictionary. Figure out how to add a few words that have diferent numeric codes before you handle words that have the same codes.       Implement the code to traverse the trie to translate an input key sequence into the corresponding word once you've built the trie, not before.

2. To read user input, you will want to use scanf

3. Before you start typing code into the computer, spend some time sketching out data structures and code (particularly trie node structs) on paper or on a whiteboard. Be sure you understand what  you are trying to do before you start typing.

4. Every time you add something new to your code (see hint #1), test it. Right Now! It is much easier to find and fix problems if you can isolate the potential bug to a small section of code you just     added or changed. gdb and printf are your friends here to examine values while debugging.

5. You will probably find it very useful to include code that can print the contents of the trie in some understandable format. This is not required, but how can you be sure your code is correct if you  can't look at the trie that is built for a small set of input data? gdb is helpful here, but it can be    tedious to visualize a data structure with one gdb print command at a time.

6. Start with a small data file and figure out in advance what the resulting trie should look like. Then verify that the program does, in fact, create that trie.

7. gdb is your friend.

8. To build the trie, you need some way to translate characters (primarily letters) contained in words read from the dictionary file to the corresponding keypad digits. It is probably a great idea to        include in your code a function that takes a character as an argument and returns the                  corresponding digit. This can be implemented with a series of if-elseif-elsestatements, but another way to do it is to have an array with one entry for each possible character. In the entry for  each character, store the corresponding digit code. Then you can look up the code for a character  without a complicated nest of ifstatements. (Recall that in C, as in Java and other languages, a   character can be used as a small integer. That is particularly helpful for exactly this kind of            application, where we might want to use a character value as an index into a table of data.)

9. Be sure to check for errors like trying to open a nonexistent file to see if your error handling is working properly.

10. Once you're done, read the instructions again to see if you overlooked anything.

Test Sequences:

The sequences below can be used to validate your trie against the supplied dictionary.txtfile.

1. 22737: acres, bards, barer,bares,barfs,baser,bases,caper,capes,cards,carer,cares,cases

2. 46637: goner,goods,goofs,homer,homes,honer,hones,hoods,hoofs,inner

3. 2273: acre, bard,bare,barf,base,cape,card,care,case

4. 729: paw,pax,pay,raw,rax,ray,saw,sax,say

5. 76737: popes,pores,poser,poses,roper,ropes,roses,sords,sorer,sores

Assessment

Your solutions should be:

  Correct, meeting the above specifications

  Compile and run on cancun

  Written in good style. Use the hw5Linterstyle checker to locate potential problems   Have no memory leaks or errors. Use valgrindto help check for possible problems.

We will not run additional tests against your code aler you submit to Gradescope, but we will manually grade to make sure you follow the instructions in the spec.

Turn-in Instructions

Use gradescope, linked on the course resources web page, to submit your files (drag them onto the gradescope page):

the source code files for your program (trienode.h, trie.c, tnine.c, and Makefile),

header files, and

the Makefilefor your program

Turn in separate files; do not turn in a tar, zip, or other kind of archive file (Gradescope should  automatically unpack the files if you do use a zip archive or something similar to drag your files to Gradescope's window, but be sure it successfully unpacks the actual files.)

Gradescope will allow you to turn in your homework up to two days late, if you choose to use one or two of your late days.