Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework Assignment #2 - Hash Function DIY Contest

Designed by Yiran "Lawrence" Luo, CSE 310 Fall 2022

● Type: Individual or group, up to 3 members

● Synopsis: Design your own hash function to encode ordinary strings and materialize simple uniform hashing (as is indicated in Hash Table lecture).

● Keywords: Hash table, hash function

● Programming Language: C++

● Requirements: You may code on your own PC but you have to make sure your program compiles and runs correctly on general.asu.

● Deliverables: A zip/tar archive containing the following:

○ main.cpp

○ hash.cpp and hash.h

○ Any new files you create on your own other than the givens

○ (optionally) all other given files in the starter package that you are not supposed to change, and will be overwritten during grading anyways

○ Makefile

○ A README.txt with your name(s) and a custom team alias in it that documents what your program is doing

● Full score: 15 pts

● Deadline: Friday, Oct 28th at 2359 hours, after which late submissions will not be accepted. No extension will be made because -

● +2 extra credits for the best-performing 5 teams/persons, with details in the last part .

Restrictions

You are only allowed to use the following functionalities in C++:

● <string>

● <cmath>

● <stdio.h> and <stdlib.h>

● <iostream> <fstream> and <sstream>

● struct and class

● Variable and pointer types of:  int, char, string, float, and double

Using any of the following functionalities gives you an unfair edge and are thus BANNED:

● <vector>, <map>, <set> or any predefined container library defined here.

● rand() or any randomizer as part of your hash function

Failure to comply with the above restrictions will result in a hefty penalty of at least 50%.

Overview

1. Download the starter package from Canvas, in which you are given the code templates and sample input cases. A gold standard demo is also provided to show you how the outputs should look like.

2. Your job is to design A)  A Chaining hash table that categorizes and stores string tokens loaded from a single input file, and B) Your own hash function that sorts each string into a corresponding hash table slot.   

a. Each hash table slot is a linked list of the Node struct.

b. You are free to design however you manipulate the input in the hash function using built-in math functions. But,

c. Your hash function should always return the same hash value given the same input string in any occasion. In other words, your hash function should never return different outputs in multiple occasions given the same input, and thus if your hash function turns out to be inconsistent, this would be considered incorrect.

3. Once you fit all the tokens from the input file into the hash table using your own hash function, print out the following information. The following example shows the expected printed outputs w.r.t 5 slots and 9 specific tokens, when your hash function sorts each token by its first letter:

a. The contents of your hash table's first 5 slots:

==== Printing the contents of the first 5 slots ====

Slot 0: apple abandon Amazon Applebee

Slot 1: banana barbaric boring Boeing

Slot 2:

Slot 3:

Slot 4: elephant

b. The lengths of every slot as in:

==== Printing the slot lengths ====

Slot 0: 4

Slot 1: 4

Slot 2: 0

Slot 3: 0

Slot 4: 1

c. The standard population deviation of all the hash table slot lengths. Notice the N in the following formula is equal to our k.

This float-type number gives us the idea how fairly your hash function sorts the strings from a certain input, e.g.:

==== Printing the standard variance ====

1.8330 

The smaller the standard deviation is, the better your hash function performs for each test case.

Input formats

You load the contents of a .txt file via stdio into an array of strings. (This part of code aka 'tokenizer' will be provided as part of main.cpp).

We assume an input file ALWAYS follows these formats:

1. The first line specifies the number of hash table slots k (5 <= k <= 100)

2. The following lines consist of one token per line, in only upper and lower case letters. There will not be duplicate tokens in a single input file.

3. The number of tokens N does not exceed 500.

4. Example (sample_input.txt):

5

Amazon

Boeing

apple

Applebee

abandon

banana

elephant

boring

barbaric

Build and test run

You are supposed to use a Makefile to build your program. Using the given Makefile, you should be running your program with a such command in prompt:

./encoder < inputs/input.txt

To load another file, simply replace what you feed in:

./encoder < inputs/input2.txt

You are more than welcome to create your customized input files/test cases and test the robustness of your hash function design.

Rubrics and Bonus

We grade your submission based on the correctness of :

1. The implementations of the hash table and your customized hash function, (3 pts)

2. The 3 print-outs of the hash table information (the first 5 slots' contents, all slots' lengths, and the standard variance), (3 x 3 pts) and

3. Turning in the correct files to build your program successfully on general.asu (3 pts).

When we are grading your work, we will be using a different set of test cases from the ones given to you in the starter kit. The top 5 teams/persons that achieve the lowest average standard deviations with our test cases will be granted 2 extra credits and will be decorated in our Hall of Top 5 Benchmarkers.