CS245  Summer 2000

Assignment 4
due in class on Wednesday August 2


PROBLEM 1 (25 points)

Assume we want to evaluate the following relational expression: 

  select_{c1}(R) join select_{c2}(S) join select_{c3}(T) 

As mentioned in the class, there are many plans we can construct 
from this expression, each corresponding to a different join order. 
Moreover, there are many different algorithms for evaluating a join 
(e.g., nested loop join, merge sort join, hash join) or a selection 
(e.g., with or without using an index). That means that each query 
plan can be evaluated in many different ways, depending on the way 
each relational operator is executed. We call each such "instantiation"
 of a query plan a physical query plan. 

Assume we can choose between two different ways for doing selections 
and between three different join algorithms (we can do one selection 
one way, and another selection another way). Also assume that joins 
are asymmetric, i.e., R join S is different from S join R. Finally, 
assume that we are only considering left-deep join orders. 

What is the number of different physical query plans available for 
the above relational expression, given our assumptions? Generalize 
your answer to the case of an expression involving n selections and 
n-1 joins. 

Note that the exact selection predicates are irrelevant, and so are 
the schemata of the three relations. 


PROBLEM 2 (25 points)

Consider an extensible hash structure where buckets can hold
up to three records.  Initially the structure is empty.

Then we insert the following records, in the order below,
where we indicate the hashed key in parenthesis (in binary):

   a [000110]
   b [111100]
   c [101000]
   d [101111]
   e [010110]
   f [101000]   
   g [101001]
   h [011010]
   i [011010]
   j [001110]

Show the structure after these records have been inserted.


PROBLEM 3 (25 points)

For the same records, hash keys, and assumptions
as PROBLEM 2, show the *linear* hash structure for this file.
Initially the structure is empty.

Assume that the threshold value is 2.  (i.e., when the average
number of keys per non-overflow bucket is greater than  
2, we allocate another bucket).


PROBLEM 4 (25 points)

Extensible hashing uses the first "i" high-order bits of the hash key.

Suppose that instead we use the last "i" low-order bits of the hash
key, and use exactly the same extensible hashing algorithm.  What
would happen?  If there is a problem, can it be fixed easily?

Also, linear hashing uses the last "i" low-order bits of the hash key.
Suppose that instead we use the first "i" high-order bites of the hash
key, and use exactly the same linear hashing algorithm.  What would
happen?