Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS214, Winter 2023,

Project #1

Parallel Quicksort

Deadline. The Project is due 11:59pm, Mon 02/27, 2023.

Submit Your Work. You must submit your codes via GitHub, and submit your report via GradeScope. In the test, we will only use the header file (quicksort .h). All functions you implemented must be in that header file. In our final test, we will call the functions defined in that file. See more details below.

Late Policy. You have up to two grace days (calendar day) for the entire quarter. You don’t lose any point by using grace days. If you decide to use grace days, please specify how many grace days you would like to use and the reason at the beginning of your submission.

Academic Integrity. You can discuss with your classmates or use online resources. In particular,

•  Collaboration  Policy.   You can discuss your implementations and your optimizations with your classmates. You can get help from the instructor, but only after you have thought about the problems on your own. If you discussed with anyone else, you need to acknowledge them in your report.

•  Other sources. It is OK to get inspiration (but not solutions) from books or online resources, again after you have carefully thought about the problems on your own. If you use any reference or webpage, or discussed with anyone, you must cite it fully and completely in your report.

•  Other Libraries. You are free to use any sequential code in C++ standard library (e.g., std::sort) by including the corresponding header files.  However, you cannot use any existing parallel libraries. In particular, all codes you submitted from quicksort .h must be implemented on your own. You are free to use any previous code implemented by yourself.  For example, if you have implemented some code in project 1, you can directly copy them to project 2 if you need.

However, you cannot:

• Share your report/code with anyone else.

• Read other’s report/code.

• Copy anything from other source, including code and report.

• Get help from others without acknowledging them in your report, or use any relevant resources (e.g., online article) without citing them.

Otherwise it will be considered cheating. We reserve the right to deduct points for using such material beyond reason.

Write-up.   You need to submit a report describing your methodology, optimizations,  and possible experiments you designed to test the performance and the experimental results.   The report will be an important part in grading.

In this project, we will implement a parallel quicksort algorithm as we talked about in class. There are points.

1    Overview

You can find the template files by this invitation link

https://classroom.github.com/a/BVbwHohp

There are two files to take care of:

• quicksort .cpp. This file is an example testing file for you. The actual testing code will be very similar, but differs in how to generate input data.  As you will see, the only function you need to implement is the quicksort function. When you test your own code, you can modify any files as you need, but please don’t leave anything useful in this cpp file. You can implement your distributions in this file.

• quicksort .h.  You need to implement your quicksort function in this file.  Of course you can add auxiliary functions or variables, but please make sure to put all your code in this file.

• makefile.  This is also an example makefile to help you test your code.  It is basically the same as what we will use in final test.  You can edit the file as you like.  Also, do not put anything useful in this file since that’s not used in the final test.

In this project, you need to submit:

• A mid-report on Tue, 02/14. In the mid report, you need to report your current progress. The require- ment of your mid report is to at least have a highly-optimized version of the partition algorithm. You need to present how you implemented it and some testing data.

• A final report on Mon 02/27. In the final report, you need to report how you implemented the entire algorithm, what test data you designed and some testing result.

• Your implementation of function quicksort in quicksort .h (just commit and push your changes to github).

2    Brief Guideline

Here’s a brief guideline.

1. First, note that you need to implement a scan algorithm.  For the best performance, you probably want to avoid allocating memory during the algorithm. That is to say, if you need extra space, allocate the memory in advance. This is because allocating memory in parallel can be expensive.

2. Then, based on your scan algorithm, implement a filter algorithm and then a partition algorithm.

3. For all of them, don’t forget to consider granularity control.

4. Then, you can implement your quicksort algorithm.

5. Next you need to test the correctness of your algorithm: you can use STL sorting algorithm compare your output with STL.

6. Finally, you need to test the performance of your code. Try to design different test cases and test their performance. If they show different performance, try to think about the reason behind that.


3    Possible Optimizations

Here are some optimizations and hints you can consider in your implementation.  Of course, you can use more.

1. Coarsening.  Use the similar idea as shown in Homework 0 and 1.  Try to find the best threshold to determine a sequential or a parallel run.

2. You can call STL sorting algorithm or other existing sorting algorithms to work on the base cases. Since STL is highly-optimized, it’s likely to give better performance than your sequential version.

3. Try to avoid using std::vector, especially avoid resizing it in parallel. This is because resizing involves allocating and deallocating memory, which is also slow in parallel.

4. Try to design a hash function to generate random numbers instead of rand().  This is because this function is not designed for running in parallel.

5. Choosing an arbitrary key as the pivot gives good theoretical bound using randomization, but in practice we can even do more to avoid bad luck. For example, you can pick up t random elements in the original array, find the median of them, and use the median as the pivot. This helps to split your input in a more balanced way. t is usually selected as a small constant.

6. Of course you can think of more interesting optimizations!

4    Design Your tests

To test the performance, you will need to design some test cases with different patterns.  This is to make sure your algorithm performs well in different inputs, and to understand the good and bad cases for your algorithm. In our final test, we will have five test cases, each of size 108 . In all test cases, the input keys are 64-bit integers. For each test, we will use the average of the five runs as your final running time (running 6 rounds, and report the average of the last five rounds). The actual testing data will not be given (you need to design your own input to test and debug), but the data pattern will be shown below.

Test case              Size                 Data


1                         108                  All distinct, uniformly random

2                         108                  Light duplicates, uniformly random

3                         108                  Light duplicates, skewed distribution

4                         108                  Heavy duplicates, uniformly random

5                         108                  Heavy duplicates, skewed distribution


5    Grade

Your grade will consist of three parts: mid-report (4pts), final report (4pts), performance (8pts + bonus, see more details below).


Performance

Test 1-3    (2 × 3 pts)

5-10s    4-5s     3.2-4s   2.5-3.2s 2-2.5s   < 2s

1pt per test

1.5pt per test

1.8pt per test

2 pt per test

2.5 pts per test (bonus)

3 pts per test (bonus)

Test 4-5    (1 × 2 pts)

4-10s    2.5-4s   1.5-2.5s 0.8-1.5s < 0.8s

0.5pt per test

0.7 pt per test

1 pts per test

1.2 pts per test

1.5 pts per test

To get the running time point, your algorithm has to be correct. That means, if your output is wrong, you cannot get points for the corresponding tests.

In the report, you need to describe your methodology, optimizations and their performance, possibly with running time data. Note that the report takes 50% of the entire project grade. They are supposed to be formal and in details.