CS245 Summer 2000 Assignment 4 due in class on Wednesday August 2 PROBLEM 1 (25 points) Assume we want to evaluate the following relational expression: select_{c1}(R) join select_{c2}(S) join select_{c3}(T) As mentioned in the class, there are many plans we can construct from this expression, each corresponding to a different join order. Moreover, there are many different algorithms for evaluating a join (e.g., nested loop join, merge sort join, hash join) or a selection (e.g., with or without using an index). That means that each query plan can be evaluated in many different ways, depending on the way each relational operator is executed. We call each such "instantiation" of a query plan a physical query plan. Assume we can choose between two different ways for doing selections and between three different join algorithms (we can do one selection one way, and another selection another way). Also assume that joins are asymmetric, i.e., R join S is different from S join R. Finally, assume that we are only considering left-deep join orders. What is the number of different physical query plans available for the above relational expression, given our assumptions? Generalize your answer to the case of an expression involving n selections and n-1 joins. Note that the exact selection predicates are irrelevant, and so are the schemata of the three relations. PROBLEM 2 (25 points) Consider an extensible hash structure where buckets can hold up to three records. Initially the structure is empty. Then we insert the following records, in the order below, where we indicate the hashed key in parenthesis (in binary): a [000110] b [111100] c [101000] d [101111] e [010110] f [101000] g [101001] h [011010] i [011010] j [001110] Show the structure after these records have been inserted. PROBLEM 3 (25 points) For the same records, hash keys, and assumptions as PROBLEM 2, show the *linear* hash structure for this file. Initially the structure is empty. Assume that the threshold value is 2. (i.e., when the average number of keys per non-overflow bucket is greater than 2, we allocate another bucket). PROBLEM 4 (25 points) Extensible hashing uses the first "i" high-order bits of the hash key. Suppose that instead we use the last "i" low-order bits of the hash key, and use exactly the same extensible hashing algorithm. What would happen? If there is a problem, can it be fixed easily? Also, linear hashing uses the last "i" low-order bits of the hash key. Suppose that instead we use the first "i" high-order bites of the hash key, and use exactly the same linear hashing algorithm. What would happen?