Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

FIRST SEMESTER 2020/2021

FINAL EXAMINATIONS

BACHELOR DEGREE Year 3

CPT201

DATABASE DEVELOPMENT AND DESIGN

Question 1. Answer the following questions on indexing in database systems. [25 marks]

A relation called employee has 10,000 tuples. The tuples are stored as fixed length and fixed format records; each has the length of 200 bytes. Tuples contain the key attribute name with length of 20 bytes. The tuples are stored sequentially in a number of blocks, ordered by name. Each block has the size of 4,096 bytes (i.e., 4 Kilobytes) and each tuple is fully contained in one block. Suppose that a primary index using B+ tree on the name attribute is to be created. The length of each index entry is 30 bytes: 20 bytes for the search key and 10 bytes for the pointer to data (an 8 byte block id and a 2 byte offset). Each index entry is also fully contained in one block.

a)   Compute the number of tuples of the relation employee that one block can hold. [2/25]

b)  Compute the number of disk blocks needed to store the relation employee. [3/25]

c)   If the primary index on name is sparse, i.e., one index entry for one block, compute the number of blocks needed to store the primary index. [4/25]

d)  If the primary index on name is dense, i.e., one index entry for one tuple, compute the number of blocks needed to store the primary index. [4/25]

e)   Suppose at  some point, the B+ tree index on name attribute is  shown as follows. The number of pointers in a node is 4. Draw the B+ tree for each of following operations (in total there should be four trees): (1) insert Amy’; (2) insert Linda’; (3) delete Linda’; (4) delete Amy’ . (Each of the operations, besides (1), must be performed on the B+ tree drawn for the previous operation and (1) must be performed on the following B+ tree).

[8/25]

f)   Suppose that an extendable hash index is created to keep track of the training courses that the employees have taken. Based on the hash values on the titles of the training courses in the  table below  and  the  initial  hash  index,  draw  the  hash  index  after  insertion  of the following tuples (tuple is of the form “title_trainingCourses, employee.name”): (1) ‘Data mining, Amy’; (2) ‘Data mining, Linda’ .

[4/25]

Question 2. A student report management system contains the following three relations.

i) student (SID, name, department, email)

ii) produces (SID, RID)

iii) report (RID, title, year, topic)

SId”  and RID”  are  the  keys  for  relations student and report,  respectively.  The  two attributes  in  relation produces are  the  foreign  keys  referencing student and report, respectively. The catalog information about the three relations is given below.

•   number of tuples in student, nstudent = 7,000, stored on 300 blocks;

•   number of tuples in produces, nproduces = 50,000, stored on 500 blocks;

•   number of records in report, nreport = 35,000, stored on 1,000 blocks;

•   index: a primary B+-tree index of height 3 on the SID attribute of the produces relation;

•   number of distinct values for the attribute department in the student relation, V(student, department) = 50;

Answer the following questions. [25 marks]

a)   For the query find all reports which were written after 2017 and before 2019 ’, describe how it can be evaluated in the most cost effective way by using a primary B+ tree index on year and a secondary B+ tree index on year, respectively. [4/25]

join is more efficient? Justify your answer. [8/25]

c)   Assume that both relations are not sorted, memory size M=10 blocks, buffer size bb =1 block. To evaluate student ▹◃ produces” using the merge join algorithm, how many block transfers and seeks would be needed, respectively? [8/25]

d)  An optimised query evaluation plan is shown below. Note that no intermediate relations need to be  stored as the result of using pipelining. The B+-tree index of the produces relation is used to perform the indexed nested loop join. Assume that linear scan is used to evaluate all the selection operations. What is the total number of block transfers for the evaluation plan? Justify your answer.

[5/25]

Question 3. Answer the following questions. [25 marks]

a)  Explain what problem will arise from the following schedule which uses the two-phase locking protocol.

T1

T2

lock-X(B);

read(B);

B = B − 50;

write(B);

lock-S(A);

read(A);

lock-S(B);

lock-X(A);

read(A);

A = A + 50;

write(A);

unlock(B);

unlock(A).

read(B);

display(A + B);

unlock(A);

unlock(B).

[3/25]

b)  Consider  the  following  schedule  and  assume  initially  X=5  and  Y=5.  Can  it  be

transformed to a serial schedule? What is the value of calling display(X+Y)?

[3/25]

c)  Draw the precedence diagram for the following schedule. Determine if it is conflict serialisable and justify your answer.

1

2

3

4

5

6

7

8

9

10

T1

write(A)

read(B)

read(C)

write(B)

T2

T3

write(B)

write(C)

read(C)

write(B)

write(C)

read(A)

[4/25]

d)  Consider the same schedule in Question 3.c), is it view serialisable? If yes, what serial schedule is it equivalent to? Justify your answer. [5/25]

e)   Consider the same schedule in Question 3.c), is it recoverable? Justify your answer. [4/25]

f)   Consider the following transaction logs. (1) What are the transactions in the checkpoint L{}? (2) When recovering from the system crash, in the redo pass, which operations need to be redone? (3) After the redo pass, what transactions are left in the undo list? (4) In the undo pass, which operations need to be undone? (5) What logs need to be added after the recovery is successful?

Start of the logs

<T17 start>

<T17, B, 300, 700>

<T12 start>

<T17 commit>

<T14 start>

<T15 start>

<T14, B, 700, 1700>

<T12, D, 70, 20>

<T13 start>

<T15, C, 200, 100>

<T15 commit>

<T12 commit>

<T16 start>

<checkpoint L{…}>

<T16, A, 100, 300>

<T14, B, 700>

<T14 abort>

<T13, D, 20, 19>

ßSystem crash, start recovery



[6/25]

Question 4. Answer the following questions. [25 Marks]

a)   Santa Claus stores the names of all the children in the world and the toys they want, in distributed databases. Consider the following three relations at two sites:

Site 1:

Children (id, kid_name, age, address, city)

Request (id, toy_id)

Site 2:

Toys (toy_id, toy_name, toy_color, min_age)

Assume that the query find the names of childrenfrom the city of Beijing and their toy names” is issued to the Site 1. It is also known that only a small’ number of toys was wanted by the children from the Beijing city. Describe in detail how semi-join can be used to optimise the query. [8/25]

b)  Consider the following table for market basket analysis. Columns represent the items and rows represent transactions. Each cell represents whether an item is purchased in a transaction, “1” means yes and “0” means no. Compute the following:

(1) confidence (Bread ->> {Beer, Diapers})

(2) support (Milk ->> Cola)

(3) confidence ({Beer->> Milk})

(4) support (Eggs, Bread})

(5) confidence(Cola->>Milk)

Transaction

Bread

Milk

Diapers

Beer

Eggs

Cola

t1

1

1

0

0

0

0

t2

1

0

1

1

1

0

t3

0

1

1

0

0

1

t4

1

1

1

0

0

0

t5

1

1

0

1

0

1

[5/25]