COMP SCI 3305, 7305 Parallel and Distributed Computing Primary Examination, Semester 1, 2019
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Primary Examination, Semester 1, 2019
Parallel and Distributed Computing
COMP SCI 3305, 7305
Question 1
(a) Provide brief answers to the following questions:
(a) What is a lock?
(b) What is a race condition? [4 marks]
(b) Below is pseudo-code for two threads, running in parallel. The shared
variable counter should be incremented by two. But you observe that sometimes this does not happen as expected.
i. Explain carefully circumstances in which an incorrect value could be produced. Note that tmp1 and tmp2 are local variables. You may assume that counter is initialized to 0.
// Task 1:
tmp1 = counter; tmp1++;
counter = tmp1;
// Task 2:
tmp2 = counter; tmp2++;
counter = tmp2; [5 marks]
(c) Discuss briefly the most important difference between MIMD and SPMD. [3 marks]
(d) This question is about high performance computers with a distributed memory architecture, and the interconnection networks used with them.
i. Draw a diagram showing the structure of a distributed memory high performance computer. Briefly explain each item on the dia- gram, in particular why the interconnection network is important. [3 marks]
ii. Suppose that you are designing a high performance computing, with a requirement to efficiently support applications that have a moderate to high intrinsic communication. Discuss suitable inter- connection network architectures. Compare at least two alterna- tives. [4 marks]
[Total for Question 1: 19 marks]
Question 2
(a) Suppose that you wish to use a group of threads to form the sum, in
a shared variable result, of an array of numbers. This can be done by giving each thread responsibility for a portion of the array, such that it forms, in its local memory space, the sum of its portion of the array. Outline (using pseudo-code), with explanation, an algorithm that can be executed in parallel by each task in an MPI program.
The array should be distributed, and global sum should be computed, using suitable collective operations.
Include in your explanation a diagram showing the respective tasks and address spaces, in particular, the placement of the array, and the variables representing local and global sums. [8 marks]
(b) i. Define the term speedup. [2 marks]
ii. The efficiency of a parallel program executed on N processors is defined as S/N, where S is the speedup.
If a parallel program is run on N processors, what is the (ideal) maximum speedup normally expected? What is the corresponding efficiency? Explain briefly. [3 marks]
(c) The following run times, t, were recorded for a program using increas- ing numbers of processors, P.
P 1 2 3 4 5 6 7
t 100 60 40 30 23 20 18
Using these results, draw two graphs. The first graph is the above data, t against P. For the second graph, compute the efficiency at each
value of P, and plot efficiency against P.
Explain your results. [8 marks]
[Total for Question 2: 21 marks]
Question 3
(a) Batch scheduling is commonly used in high-performance computing.
Briefly discuss what is meant by batch scheduling, and why it is often preferred to interactive operation on high-performance computers. [3 marks]
(b) The following pseudo-code describes the code executing in each of
several MPI tasks.
It is intended to represent nodes communicating with each other in a ring topology, with communications commands in the following order:
send value to right
receive value from right
The purpose of the code is to cyclically shift the values one place to the right around the ring.
i. What is the problem with this pattern of communication? Explain clearly, with diagrams. [3 marks]
ii. How would you fix it? Again explain with diagrams. [3 marks]
(c) This question is about MPI collective operations, especially MPI Reduce.
i. Explain, with diagrams, how an MPI Reduce collective operation is executed in MPI. You may use the addition of numbers as an example. [3 marks]
ii. Suppose that we are using MPI reduce with addition. What can we say about the order in which the individual addition operations are done? [2 marks]
iii. MPI does not provide a reduction with subtraction as the operator. Why? Give an example to illustrate your answer. [3 marks]
[Total for Question 3: 17 marks]
Question 4
(a) Discuss briefly the role of the Netflix Simian Army, and what it is capa- ble of doing. In your answer, address this question: would it be better
to build a distributed system that does not fail, rather than spending a lot of time and effort to develop something like the Simian Army? [6 marks]
(b) Consider each of the statements below. State whether the statement
is true or false, then write one line of explanation for your answer.
1. MPI is a programming language.
2. Java uses automatic parallelization.
3. Deadlock is impossible when MPI is used.
4. Increasing problem size always increases speedup.
5. A barrier is a form of global synchronisation. [10 marks]
[Total for Question 4: 16 marks]
Question 5
This question is about OpenCL.
(a) Briefly explain the OpenCL memory model. [4 marks]
(b) List all steps that a host program must include, when using OpenCL
for parallel work. [9 marks]
(c) OpenCL is based on C, but with some restrictions. List two restrictions, and briefly give reasons why they may have been imposed. [4 marks]
[Total for Question 5: 17 marks]
2023-06-02