COMP20003 Algorithms and Data Structures Semester 2, 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP20003 Algorithms and Data Structures
Semester 2, 2022
Purpose
The purpose of this assignment is for you to:
Improve your proficiency in C programming and your dexterity with dynamic memory allocation. Demonstrate understanding of a concrete data structure (linked list).
Practice multi-file programming and improve your proficiency in using UNIX utilities.
Background
A dictionary is an abstract data type that stores and supports lookup of key, value pairs . For
example, in a telephone directory, the (string) key is a person or company name, and the value is
the phone number. In a student record lookup, the key would be a student ID number and the
value would be a complex structure containing all the other information about the student.
Your Task
In this assignment, you will create a simple dictionary based on a linked list to store information
from a dataset about the City of Melbourne . A user will be able to search this dictionary to retrieve
information about footpaths in Melbourne using attributes in the dataset (keys).
Your implementation will build the dictionary by reading the data from a file and inserting each
footpath record as a node in a linked list . You will also implement a method to search for a key in
the list, outputting any records that match the key. Note that keys are not guaranteed to be
unique!
Implementation Details
Stage 1 - Data Retrieval
In Stage 1, you will implement the basic functionality for a dictionary allowing the lookup of data
by key (address).
Your Makefile should produce an executable program called dict1. This program should take three
command line arguments:
The first argument will be the stage, for this part, the value will always be 1.
The second argument will be the filename of the data file.
The third argument will be the filename of the output file.
Your dict1 program should:
Read the data from the data file specified in the second command line argument. The data from
the CSV should be stored in a linked list of pointers to structs for the data . Datatypes for each
field should be consistent with those in the Dataset slide . Each record (row) should be stored in a
separate node.
Accept addresses from stdin, search the list for all matching records and print them to the output
file . You may assume that all queries will be terminated by a new line to allow for the querying of
footpaths without an address. If no matches for the query are found, your program should output
NOTFOUND.
In addition to outputting the record(s) to the output file, the number of records found should be
output to stdout.
For testing, it may be convenient to create a file of keys to be searched, one per line, and redirect
the input from this file. Use the UNIX operator < to redirect input from a file.
Example Execution
An example execution of the program might be:
Stage 2 - Building an Index for Approximate Lookup
In Stage 2, you will implement a simple calculated index on the dataset (address).
An index allows for faster access to particular attributes in a database, in this stage you will
construct an index on a single attribute and search on that attribute . Though you should assume a
diferent attribute could be used to build the index, you will only need to build an index for a single
attribute.
Your Makefile should now also produce an executable program called dict2. This program should
take three command line arguments:
The first argument will be the stage, for this part, the value will always be 2.
The second argument will be the filename of the data file.
The third argument will be the filename of the output file.
Your dict2 program should:
Just as in Stage 1, read the data from the data file specified in the second command line
argument . The data from the CSV should be stored in a linked list of pointers to structs for the
data . Datatypes for each field should be consistent with those in the Dataset slide . Each record
(row) should be stored in a separate node.
For Stage 2, you should also construct a sorted array which contains pointers to each node in the
dataset. This should be sorted on the grade1in attribute, which will be the 6th index (0-based).
For Stage 2, you should accept double type values from stdin, search the index for the closest
record to that value and print it to the output file. You may assume that no blank lines will be
present in the queries. The closest point is defined by the absolute diference between the search
value and the dataset value . In testing, you will only receive queries with a single closest record, if
multiple closest matches for the query are found in the dataset, your program may select any or
all matching records, you are free to attempt this as a challenge if you like.
In addition to outputting the record(s) to the output file, the grade1in value of the record found
should be output to stdout.
For testing, it may be convenient to create a file of keys to be searched, one per line, and redirect
the input from this file. Use the UNIX operator < to redirect input from a file.
Example Execution
An example execution of the program might be:
Stage 2 Output Notes
You should use the input key given by the user in both the output file and stdout outputs exactly
as entered e.g. If the input 20.4900 is given, the output should also read 20.4900.
For double type numerical outputs which come from the database, you may either round or print
them with the full precision as stored in your database . If you choose to round values, this
rounding should be sensible — distances and gradients may be naturally noisy values with low
precision of one or two decimal places, but latitudes and longitudes should attempt to retain as
much precision as possible, even if some loss of accuracy occurs during the translation to the
double data type.
Your approach should be space efficient but does not need to be time efficient.
Requirements
The following implementation requirements must be adhered to:
You must write your implementation in the C programming language.
You must write your code in a modular way, so that your implementation could be used in another
program without extensive rewriting or copying . This means that the dictionary operations are
kept together in a separate .c file, with its own header (.h) file, separate from the main program,
and other distinct modules are similarly separated into their own sections, requiring as little
knowledge of the internal details of each other as possible.
Your code should be easily extensible to multiple dictionaries. This means that the functions for
interacting with your dictionary should take as arguments not only the values required to perform
the operation required, but also a pointer to a particular dictionary, e.g. search(dictionary, value).
Your implementation must read the input file once only.
Your program should store strings in a space-efficient manner. If you are using malloc() to create
the space for a string, remember to allow space for the final end of string character, ‘\0’ (NULL).
Your program should store its data in a space-efficient manner. If you construct an index, this
index should not contain information it does not need.
Your stage 2 index should allow time-efficient lookup.
For stage 2, you may assume the index only needs to be built after all records are inserted.
A full Makefile is not provided for you. The Makefile should direct the compilation of your program.
To use the Makefile, make sure it is in the same directory as your code, and type make dict1 and
make dict2 to make the dictionary. You must submit your Makefile with your assignment.
Hints:
. If you haven’t used make before, try it on simple programs first. If it doesn’t work, read the error
messages carefully. A common problem in compiling multifile executables is in the included
header files . Note also that the whitespace before the command is a tab, and not multiple spaces .
. It is not a good idea to code your program as a single file and then try to break it down into
multiple files . Start by using multiple files, with minimal content, and make sure they are
communicating with each other before starting more serious coding.
Programming Style
Below is a style guide which assignments are evaluated against. For this subject, the 80 character limit is a guideline rather than a rule — if your code exceeds this limit, you should consider whether your code would be more readable if you instead rearranged it.
Some automatic evaluations of your code style may be performed where they are reliable. As determining whether these style-related issues are occurring sometimes involves non-trivial (and sometimes even undecidable) calculations, a simpler and more error-prone (but highly successful) solution is used. You may need to add a comment to identify these cases, so check any failing test outputs for instructions on how to resolve incorrectly flagged issues
Additional Support
Your tutors will be available to help with your assignment during the scheduled workshop times. Questions related to the assignment may be posted on the Ed discussion forum, using the folder tag Assignments for new posts. You should feel free to answer other students’ questions if you are confident of your skills.
A tutor will check the discussion forum regularly, and answer some questions, but be aware that for some questions you will just need to use your judgment and document your thinking. For example, a question like, “How much data should I use for the experiments?”, will not be answered; you must try out different data and see what makes sense.
If you have questions about your code specifically which you feel would reveal too much of the assignment, feel free to post a private question on the discussion forum.
Submission
Your C code files (including your Makefile and any other files needed to run your code) should be submitted through Ed to this assignment. Your programs must compile and run correctly on Ed. You may have developed your program in another environment, but it still must run on Ed at submission time. For this reason, and because there are often small, but significant, differences between compilers, it is suggested that if you are working in a different environment, you upload and test your code on Ed at reasonably frequent intervals.
A common reason for programs not to compile is that a file has been inadvertently omitted from the submission. Please check your submission, and resubmit all files if necessary.
Assessment
There are a total of 10 marks given for this assignment.
Your C program will be marked on the basis of accuracy, readability, and good C programming structure, safety and style, including documentation (2 marks). Safety refers to checking whether opening a file returns something, whether malloc()s do their job, etc. The documentation should
explain all major design decisions, and should be formatted so that it does not interfere with reading the code. As much as possible, try to make your code self-documenting by choosing descriptive variable names.
Plagiarism
This is an individual assignment. The work must be your own work.
While you may discuss your program development, coding problems and experimentation with your classmates, you must not share files, as doing this without proper attribution is considered plagiarism.
If you have borrowed ideas or taken inspiration from code and you are in doubt about whether it is plagiarism, provide a comment highlighting where you got that inspiration.
If you refer to published work in the discussion of your experiments, be sure to include a citation to the publication or the web link.
“Borrowing” of someone else’s code without acknowledgment is plagiarism. Plagiarism is considered a serious offense at the University of Melbourne. You should read the University code on Academic integrity and details on plagiarism. Make sure you are not plagiarizing, intentionally or unintentionally.
You are also advised that there will be a C programming component (on paper, not on a computer) in the final examination. Students who do not program their own assignments will be at a disadvantage for this part of the examination.
Late Policy
The late penalty is 10% of the available marks for that project for each day (or part thereof) overdue. Requests for extensions on medical grounds will need to be supported by a medical certificate. Any request received less than 48 hours before the assessment date (or after the date!) will generally not be accepted except in the most extreme circumstances. In general, extensions will not be granted if the interruption covers less than 10% of the project duration. Remember that departmental servers are often heavily loaded near project deadlines, and unexpected outages can occur; these will not be considered as grounds for an extension.
Students who experience difficulties due to personal circumstances are encouraged to make use of the appropriate University student support services, and to contact the lecturer, at the earliest opportunity.
Finally, we are here to help! There is information about getting help in this subject on the LMS. Frequently asked questions about the project will be answered on Ed.
2022-08-18