闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

FIRST SEMESTER 2020/2021

FINAL EXAMINATIONS

BACHELOR DEGREE – Year 3

CPT201

DATABASE DEVELOPMENT AND DESIGN

Question 1. Answer the following questions on indexing in database systems. [25 marks]

A relation called employee has 10,000 tuples. The tuples are stored as fixed length and fixed format records; each has the length of 200 bytes. Tuples contain the key attribute name with length of 20 bytes. The tuples are stored sequentially in a number of blocks, ordered by name. Each block has the size of 4,096 bytes (i.e., 4 Kilobytes) and each tuple is fully contained in one block. Suppose that a primary index using B+ tree on the name attribute is to be created. The length of each index entry is 30 bytes: 20 bytes for the search key and 10 bytes for the pointer to data (an 8 byte block id and a 2 byte offset). Each index entry is also fully contained in one block.

a) Compute the number of tuples of the relation employee that one block can hold. [2/25]

b) Compute the number of disk blocks needed to store the relation employee. [3/25]

c) If the primary index on name is sparse, i.e., one index entry for one block, compute the number of blocks needed to store the primary index. [4/25]

d) If the primary index on name is dense, i.e., one index entry for one tuple, compute the number of blocks needed to store the primary index. [4/25]

e) Suppose at some point, the B+ tree index on name attribute is shown as follows. The number of pointers in a node is 4. Draw the B+ tree for each of following operations (in total there should be four trees): (1) insert ‘Amy’; (2) insert ‘Linda’; (3) delete ‘Linda’; (4) delete ‘Amy’ . (Each of the operations, besides (1), must be performed on the B+ tree drawn for the previous operation and (1) must be performed on the following B+ tree).

[8/25]

f) Suppose that an extendable hash index is created to keep track of the training courses that the employees have taken. Based on the hash values on the titles of the training courses in the table below and the initial hash index, draw the hash index after insertion of the following tuples (tuple is of the form “title_trainingCourses, employee.name”): (1) ‘Data mining, Amy’; (2) ‘Data mining, Linda’ .

[4/25]

Question 2. A student report management system contains the following three relations.

i) student (SID, name, department, email)

ii) produces (SID, RID)

iii) report (RID, title, year, topic)

“SId” and “RID” are the keys for relations student and report, respectively. The two attributes in relation produces are the foreign keys referencing student and report, respectively. The catalog information about the three relations is given below.

• number of tuples in student, nstudent = 7,000, stored on 300 blocks;

• number of tuples in produces, nproduces = 50,000, stored on 500 blocks;

• number of records in report, nreport = 35,000, stored on 1,000 blocks;

• index: a primary B+-tree index of height 3 on the SID attribute of the produces relation;

• number of distinct values for the attribute department in the student relation, V(student, department) = 50;

Answer the following questions. [25 marks]

a) For the query ‘find all reports which were written after 2017 and before 2019 ’, describe how it can be evaluated in the most cost effective way by using a primary B+ tree index on year and a secondary B+ tree index on year, respectively. [4/25]

join is more efficient? Justify your answer. [8/25]

c) Assume that both relations are not sorted, memory size M=10 blocks, buffer size bb =1 block. To evaluate “student ▹◃ produces” using the merge join algorithm, how many block transfers and seeks would be needed, respectively? [8/25]

d) An optimised query evaluation plan is shown below. Note that no intermediate relations need to be stored as the result of using pipelining. The B+-tree index of the produces relation is used to perform the indexed nested loop join. Assume that linear scan is used to evaluate all the selection operations. What is the total number of block transfers for the evaluation plan? Justify your answer.

[5/25]

Question 3. Answer the following questions. [25 marks]

a) Explain what problem will arise from the following schedule which uses the two-phase locking protocol.

T1	T2
lock-X(B);
read(B);
B = B − 50;
write(B);
	lock-S(A); read(A); lock-S(B);
lock-X(A);
read(A);
A = A + 50;
write(A);
unlock(B);
unlock(A).
	read(B); display(A + B); unlock(A); unlock(B).

[3/25]

b) Consider the following schedule and assume initially X=5 and Y=5. Can it be

transformed to a serial schedule? What is the value of calling display(X+Y)?

[3/25]

c) Draw the precedence diagram for the following schedule. Determine if it is conflict serialisable and justify your answer.

write(A)

read(B)

read(C)

write(B)

write(C)

read(C)

write(B)

write(C)

read(A)

[4/25]

d) Consider the same schedule in Question 3.c), is it view serialisable? If yes, what serial schedule is it equivalent to? Justify your answer. [5/25]

e) Consider the same schedule in Question 3.c), is it recoverable? Justify your answer. [4/25]

f) Consider the following transaction logs. (1) What are the transactions in the checkpoint L{}? (2) When recovering from the system crash, in the redo pass, which operations need to be redone? (3) After the redo pass, what transactions are left in the undo list? (4) In the undo pass, which operations need to be undone? (5) What logs need to be added after the recovery is successful?

Start of the logs

<T17, B, 300, 700>

<T14, B, 700, 1700>

<T12, D, 70, 20>

<T15, C, 200, 100>

<checkpoint L{…}>

<T16, A, 100, 300>

<T14, B, 700>

<T13, D, 20, 19>

ßSystem crash, start recovery

[6/25]

Question 4. Answer the following questions. [25 Marks]

a) Santa Claus stores the names of all the children in the world and the toys they want, in distributed databases. Consider the following three relations at two sites:

Site 1:

Children (id, kid_name, age, address, city)

Request (id, toy_id)

Site 2:

Toys (toy_id, toy_name, toy_color, min_age)

Assume that the query “find the names of childrenfrom the city of Beijing and their toy names” is issued to the Site 1. It is also known that only a ‘small’ number of toys was wanted by the children from the Beijing city. Describe in detail how semi-join can be used to optimise the query. [8/25]

b) Consider the following table for market basket analysis. Columns represent the items and rows represent transactions. Each cell represents whether an item is purchased in a transaction, “1” means yes and “0” means no. Compute the following:

(1) confidence (Bread ->> {Beer, Diapers})

(2) support (Milk ->> Cola)

(3) confidence ({Beer->> Milk})

(4) support (Eggs, Bread})

(5) confidence(Cola->>Milk)