闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS 564 Final Exam Spring 2018

Answers

A: RELATIONAL ALGEBRA, SQL & NORMALIZATION [22pts]

I. [5pts] Consider a relational schema with two relations, R(A, B) and S(B, C, D), and the following query in Relational Algebra:

q = mA (R ba aC=10 (S))

Which of the following queries are equivalent to q? Clearly circle all the correct options and only the correct options.

(a) mA (aC=10 (R ba S))

(b) mA (R) ba mA (aC=10 (S))

(d) mA (R ba S) x mA (R ba aC 10 (S))

(e) mA ((R ba S) x (R ba aC 10 (S)))

ANSWER: (a), (e)

II. [9pts] Consider a relation R(A, B, C, D, E) with the following functional dependencies:

A 二 B, C C 二 D

For every one of the following decompositions, write YES/NO to answer whether it is lossless-join, dependency-preserving, or the result of a BCNF decomposition.

lossless-join

dependency-preserving

result of BCNF

ABCD, DE
ABC, ADE
ABC, CD, AE

ANSWER:

lossless-join

dependency-preserving

result of BCNF

ABCD, DE	NO	YES	NO
ABC, ADE	YES	NO	YES
ABC, CD, AE	YES	YES	YES

III. [8pts] Consider the following schema:

Customer (cid, ﬁrstname, lastname, email)

Booking (bid, date, cid, tid, partySize)

Booking.cid is a foreign key referring to Customer.cid.

Suppose we want to ask the following SQL query:

SELECT C .lastname

FROM Customer C

WHERE EXISTS (

SELECT *

FROM Booking B

WHERE B .partySize > 10 AND B .cid = C .cid);

Write an equivalent SQL query that does not use nesting or the WITH clause.

ANSWER:

I. [5pts] Suppose we have a table Products with the following attributes:

● ProductID INT PRIMARY KEY

● ProductName CHARACTER(255)

● ProductDescription CHARACTER(255)

● Code INT

Suppose that we are given a query workload where the Code attribute is used both in selection conditions and joins, while the ProductID column is not. What indexes should we have on this table? Clearly circle the correct option.

(a) CLUSTERED INDEX ON ProductID; CLUSTERED INDEX ON Code (b) UNCLUSTERED INDEX ON ProductID; CLUSTERED INDEX ON Code (c) CLUSTERED INDEX ON ProductID; UNCLUSTERED INDEX ON Code

(d) UNCLUSTERED INDEX ON ProductID; UNCLUSTERED INDEX ON Code ANSWER: (b)

II. [5pts] Consider the relation R(A, B, C, D) and suppose we want to evaluate the follow- ing selection predicate:

((A = 10) OR (A > 10)) AND (C = 10)

Which of the following indexes matches the predicate? Clearly circle all the correct op- tions and only the correct options.

(a) hash index on (A)

(b) hash index on (C)

(d) B+ tree index on (A, C)

(e) B+ tree index on (C, B)

ANSWER: (b), (d), (e)

III. [5pts] Consider a relation R(A, B, C, D) with 1, 000, 000 tuples. Suppose that attribute A can take 10 distinct values, and each value needs 15 bits to be represented. Which one between a bitmap and bitslice index on A needs less space? Explain your answer in detail.

ANSWER: The bitmap index needs 11 bits per record, while the bitslice index needs 16 bits per records. So the bitmap index takes less space.

IV. [13pts] Consider the following B+ tree that has order d = 2 (so every node can hold at most 4 search key values):

(a) [10pts] Draw the B+ tree that results from inserting a data entry with key 3.

(b) [3pts] How many page reads and page writes does the insertion require?

ANSWER: 3 reads and 5 writes

C: OPERATOR IMPLEMENTATION [20pts]

I. [10pts] We are given a relation R with N = 500 pages that we want to sort. The buffer pool has size B = 20. Fortunately, the ﬁrst 300 pages of the relation are already sorted for us (that is, they form a sorted run). What is the I/O cost of sorting the relation using the external sort algorithm? Explain your answer in detail.

Assume that we do not use replacement sort for any initial sorting. You should include in your computation the cost of writing the ﬁnal sorted result to disk.

ANSWER: In pass 0, we only need to create sorted runs for 200 pages, so we need 200 + 200 = 400 I/Os. Then, we need one more pass to merge (there are 10+1 runs, so they ﬁt in the buffer). Total cost = 400 + 2 · 500 = 1400 I/Os.

II. [10pts] We are given two relations: R with 1,000 pages and S with 2,000 pages. We are performing a key-foreign key join of R and S wherein S has the foreign key attribute. Find values of the buffer pool size B where SMJ has a smaller I/O cost than BNLJ, and where BNLJ has a smaller I/O cost than SMJ. Explain your answer in detail.

SMJ > BNLJ

SMJ < BNLJ

buffer size

SMJ > BNLJ

SMJ < BNLJ

buffer size

1100

240

D: QUERY OPTIMIZATION [20pts] Consider the following database schema:

Company (cid, cname, location)

Employee (eid, ename, age, cid)

Employee.cid is a foreign key referring to Company.cid

Product (pid, pname, price, cid)

Product.cid is a foreign key referring to Company.cid

Company has 1,000 tuples, and each record is 50 bytes long. Employee has 200,000 tu- ples, and each record is 20 bytes long. Product has 5,000 tuples, and each record is 20 bytes long. Each page can hold 5,000 bytes. The buffer pool has 102 frames.

I. [10pts] Consider the following SQL query:

SELECT COUNT(DISTINCT E .eid)

FROM Employee E, Company C

WHERE E .cid = C .cid ;

Propose the best possible physical plan that evaluates the above query and compute its I/O cost. Explain your answer in detail.

ANSWER: The cost is only 800 I/Os, since the join can be avoided, and the count can de done without the distinct.

II. [10pts] Consider the following SQL query:

SELECT SUM(price)

FROM Employee E, Company C, Product P

WHERE E .cid = C .cid AND P .pid = C .cid;

Propose the best possible physical plan for the above query that uses only BNLJ as a join algorithm and compute its I/O cost. (Hint: you may want to use pipelining). Explain your answer in detail.

ANSWER: We ﬁrst join Product and Company (any can be the outer relation, since both ﬁt together). Then we join with Company (intermediate result is the outer rela- tion). The I/O cost is 800 + 10 + 20 = 830.