Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ELEC7078 Advanced Databases

Assignment 1

Deadline: 30-OCT-2023 (Monday), noon (late submission will NOT be accepted)

1.     [B+-tree] (30%)

Suppose you are going to build a B+-tree (n=4) with the following search keys:

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}

(a)  Suggest a sequence to insert the keys such that the number of nodes in the final B+-tree is the

least. Show the steps and draw a new diagram whenever the height of the tree increases.

(b)  Suggest a sequence to insert the keys such that the final B+-tree satisfies the following conditions. Show the steps and draw a new diagram whenever the height of the tree increases.

.    The root has only two sub-trees.

.    All the keys in non-leaf nodes are odd numbers.

.    There are only two keys in every leaf node.

(c)  Suggest a sequence of search keys to be deleted from the final B+-tree in (b) to shrink the tree to 2 levels with the least number of deletions. Show all the steps and draw a new diagram whenever a node is deleted.

2.     [Query Processing and Optimization] (70%)

Consider the following database schema of a course enrollment system for students.

Student (student-id, local,  )

Course (course-id, class-size, … )

Enrollment (s-id, c-id, priority, record-id,  )

The attributes s-id and c-id in Enrollment are foreign keys referencing student-id in Student and course-id in Course, respectively. Suppose only the following information and statistics are available and assume

that all distributions are uniform and independent of each other.

.    Number of students (nStudent): 50,000

.    Number of courses (nCourse): 3,200

.    Number of enrollment records (nEnrollment): 800,000

.    Blocking factor of Student (fStudent): 8

.    Blocking factor of Enrollment (fEnrollment): 4

.    Percentage of local students: 85%

.    Range of class sizes: [10, 310]

.    Range of priorities: [0, 100]

.    4-level B+-tree primary index (n=100) on search-key student-id for Student

.    Static hash file (without overflow buckets) on search-key course-id for Course

.    4-level B+-tree primary index (n=100) on search-key (c-id, record-id) for Enrollment

.    3-level B+-tree secondary index (n=100) on search-key (priority, record-id) for Enrollment

Consider the following SQL query that retrieves all information about local students who enrolled in a course with a class size smaller than 190 and an enrollment priority higher than 90.

select *

from Student SCourse C, Enrollment E

where S.student-id=E.s-id and C.course-id=E.c-id and

S.local=true and C.class-size<190 and E.priority>90;

(a)   Write an unoptimized relational-algebra expression of the above SQL query.

(b)    Estimate the number of records the query returns. Explain your estimation.

(c)    Assume that only 11 memory blocks are available. By considering the cost of disk block transfers

(only), suggest one best evaluation plan for the query. In particular, you need to do the followings.

(i) Transform the unoptimized relational-algebra expression in (a) into an equivalent optimized

expression for producing your suggested evaluation plan. Show all the steps and state clearly the rule number of the equivalence rule you used in each step.

(ii) Draw the fully annotated evaluation plan, including exactly what algorithm and index, if any, are  used for each operation and how the execution of the operations is coordinated (materialization or pipelining).

(iii) Indicate in the evaluation plan the memory allocation (number of memory blocks allocated to   each input and output (bb), but the total cannot exceed the amount of memory available at one time).

(iv) Find the cost of each operation and the total cost of the whole plan in number of disk block transfers.

(v) State clearly any justifiable assumptions you have made in your estimation.