CS245  Winter 1999

Assignment 4
due in class on Thursday February 4

PROBLEM 1

The textbook describes B+ trees (it calls them B-trees for short).
In this problem we consider an alternative, B-trees (without the "+"),
which we call here PB trees to distinguish from B+ trees
(P for "plain").

PB trees are similar to B+ trees, except that non-leaf nodes
also have record pointers.
For each key k in a non-leaf node, we have the usual tree pointers
(to tree nodes with keys less than k, and to tree nodes with
keys greater than k), and we also have a pointer to
the record with key k in the indexed file.  

Assume that all keys to be inserted into the PB tree are distinct.

Non-leaf nodes in a PB tree have a maximum of n - 1 keys,
plus 2*n - 1 pointers (n of these pointers to other tree nodes,
n - 1 of them to records). Leaf nodes have a maximum of n - 1 keys,
plus n pointers (n - 1 of these pointers identify records, the
remaining pointer is a sequence pointer).

We wish to compare PB trees to B+ trees, and also consider
a variant of PB trees. For our comparison,
assume that keys take up 4 bytes,
and pointers also take up 4 bytes.

(a) Consider a 2 level B+ tree that is completely full.
Assume that blocks are 100 bytes in size.
(Of course, in practice, blocks will be much larger.
We use a small number to keep this problem simple.)
What is the maximum number of records that this tree can index?

(b) Consider a 2 level PB tree that is completely full.
Assume now that non-leaf blocks are 148 bytes,
while leaf blocks are 100 bytes.
What is the maximum number of records that this tree can index?

(c) In case (b) we are giving PB trees an advantage because
we allow larger blocks for non-leaf nodes.
Now assume that we force all PB nodes to fit in 100 bytes.
(We still have record pointers in non-leaf nodes, but we will
have to have fewer keys since we have less space now.)
What is the maximum number of records that a 2-level full index
of this form can handle?

(d) What can you conclude from this comparison?


PROBLEM 2

Consider an extensible hash structure where buckets can hold
up to three records.
Initially the structure is empty.
Then we insert the following records, in the order below,
where we indicate
the hashed key in parenthesis (in binary):

   a [000110]
   b [111100]
   c [010111]
   d [010000]
   e [101001]
   f [010111]
   g [101001]
   h [011010]
   i [011010]
   j [001110]

Show the structure after these records have been inserted.

PROBLEM 3

For the same records, hash keys, and assumptions
as PROBLEM 2, show the *linear* hash structure for this file.
Initially the structure is empty.
Assume that the threshold value is 2.  (i.e., when the average
number of keys per non-overflow bucket is greater than  
2, we allocate another bucket).

PROBLEM 4

Extensible hashing uses the first "i" high-order bits of the hash
 key.
Suppose that instead we use the last "i" low-order bits of the hash
key, and use exactly the same extensible hashing algorithm.  What
would happen?  If there is a problem, can it be fixed easily?

Also, linear hashing uses the last "i" low-order bits of the hash key.
Suppose that instead we use the first "i" high-order bites of the hash
key, and use exactly the same linear hashing algorithm.  What would
happen?

END OF HOMEWORK 4