CS245 Winter 1999 Assignment 4 due in class on Thursday February 4 PROBLEM 1 The textbook describes B+ trees (it calls them B-trees for short). In this problem we consider an alternative, B-trees (without the "+"), which we call here PB trees to distinguish from B+ trees (P for "plain"). PB trees are similar to B+ trees, except that non-leaf nodes also have record pointers. For each key k in a non-leaf node, we have the usual tree pointers (to tree nodes with keys less than k, and to tree nodes with keys greater than k), and we also have a pointer to the record with key k in the indexed file. Assume that all keys to be inserted into the PB tree are distinct. Non-leaf nodes in a PB tree have a maximum of n - 1 keys, plus 2*n - 1 pointers (n of these pointers to other tree nodes, n - 1 of them to records). Leaf nodes have a maximum of n - 1 keys, plus n pointers (n - 1 of these pointers identify records, the remaining pointer is a sequence pointer). We wish to compare PB trees to B+ trees, and also consider a variant of PB trees. For our comparison, assume that keys take up 4 bytes, and pointers also take up 4 bytes. (a) Consider a 2 level B+ tree that is completely full. Assume that blocks are 100 bytes in size. (Of course, in practice, blocks will be much larger. We use a small number to keep this problem simple.) What is the maximum number of records that this tree can index? (b) Consider a 2 level PB tree that is completely full. Assume now that non-leaf blocks are 148 bytes, while leaf blocks are 100 bytes. What is the maximum number of records that this tree can index? (c) In case (b) we are giving PB trees an advantage because we allow larger blocks for non-leaf nodes. Now assume that we force all PB nodes to fit in 100 bytes. (We still have record pointers in non-leaf nodes, but we will have to have fewer keys since we have less space now.) What is the maximum number of records that a 2-level full index of this form can handle? (d) What can you conclude from this comparison? PROBLEM 2 Consider an extensible hash structure where buckets can hold up to three records. Initially the structure is empty. Then we insert the following records, in the order below, where we indicate the hashed key in parenthesis (in binary): a [000110] b [111100] c [010111] d [010000] e [101001] f [010111] g [101001] h [011010] i [011010] j [001110] Show the structure after these records have been inserted. PROBLEM 3 For the same records, hash keys, and assumptions as PROBLEM 2, show the *linear* hash structure for this file. Initially the structure is empty. Assume that the threshold value is 2. (i.e., when the average number of keys per non-overflow bucket is greater than 2, we allocate another bucket). PROBLEM 4 Extensible hashing uses the first "i" high-order bits of the hash key. Suppose that instead we use the last "i" low-order bits of the hash key, and use exactly the same extensible hashing algorithm. What would happen? If there is a problem, can it be fixed easily? Also, linear hashing uses the last "i" low-order bits of the hash key. Suppose that instead we use the first "i" high-order bites of the hash key, and use exactly the same linear hashing algorithm. What would happen? END OF HOMEWORK 4