CS245 PROBLEM SET #3 Due by 5:30pm, Tuesday, July 27, 1999 |
The outerjoin of relations R and S is a variation of the join (or natural join) of R and S. The outerjoin result is formed by starting with R join S, and adding any dangling tuples from R or S ( a tuple is dangling if it did not join with any tuple from the other relation). The added dangling tuples must be padded with a special NULL symbol for all attributes that they do not possess, but that appear in the join result. For example, consider R and S as follows.
R: A B S: B C ----------- ----------- 1 2 1 4 2 5 2 6R outerjoin S is
A B C ----------------------- 1 2 6 NULL 1 4 2 5 NULL
Consider three relations R(A,B), S(B,C), T(C,D) with the following statistics:
Suppose you have 2 relations, R(A, B, C) and S(B, C, D, E). You have a clustered unique (no duplicate keys) B+-tree index on attribute A for relation R. Assume this index is kept entirely in memory (i.e., you do not need to read it from disk).
For relation S, you have two indexes: (i) a non-clustered non-unique B+-tree index for attribute B, and (ii) a clustered non-unique B+-tree index for attribute C. Assume that these two indexes are also kept in memory. Also, assume that all of the tuples of S that agree on attribute C are stored in sequentially adjacent blocks on disk (that is, if more than one block is needed to store all of the tuples with some value of C, then these blocks will be sequentially located on the disk).
Other relevant data:
SELECT * FROM R, S WHERE (R.B=S.B) AND (R.C=S.C)We present you with two query plans:
Plan 1:
for every block BL of R retrieved using the clustered index on A for R do for every tuple r of BL do use the index on B for S to retrieve all of the tuples s of S such that s.B=r.B; for each of these tuples s do if (s.C=r.C) then output r.A, r.B, r.C, s.B, s.C, s.D, s.E;
Plan 2:
for every block BL of R, retrieved using the clustered index on A for R do for every tuple r of BL do use the index on C for S to retrieve all of the tuples s of S such that s.C=r.C; for each of these tuples s do if (s.B=r.B) then output r.A, r.B, r.C, s.B, s.C, s.D, s.E;
select_{c1}(R) join select_{c2}(S) join select_{c3}(T)
There are many plans we can construct from this expression, each corresponding to a different join order. Moreover, there are many different algorithms for evaluating a join (e.g., nested loop join, merge sort join, hash join) or a selection (e.g., with or without using an index). That means that each query plan can be evaluated in many different ways, depending on the way each relational operator is executed. We call each such "instantiation" of a query plan a physical query plan.
Assume we can choose between two different ways for doing selections and between three different join algorithms (we can do one selection one way, and another selection another way). Also assume that joins are asymmetric, i.e., R join S is different from S join R. Finally, assume that we are only considering left-deep join orders.
What is the number of different physical query plans available for the above relational expression, given our assumptions? Generalize your answer to the case of an expression involving n selections and n-1 joins.
Note that the exact selection predicates are irrelevant, and so are the schemata of the three relations.