(a) Since the blocks are stored sequentially, we only need a pointer to
the
first block. The first block of the index will have a block pointer to the
first file block and key values. The rest of the index block will only
have
key values.

We can fit

ë*(2048-8) / 12)*
û * = 170 *
keys in the first index block and

ë*2048/12*
û* = 170 *
keys in the remaining blocks.

The total number of blocks we need is

1+
é*(500000-170)/170 *
ù=*2942* blocks.

(b) Since the blocks are not contiguous, we need a block pointer for
every block. Each entry will be

8 bytes(pointer) + 12 bytes(key) = 20 bytes.

We can fit

ë*(2048 / 20)*
û * = 102 * key pointer pairs in a
block.

The total number of blocks we need is

é*500000 / 102 *
ù=*4902* blocks.

(c) Since blocks are not contiguous, we need a pointer for each block. We
can
fit

ë*(2048 / 20)*
û * = 102 * key pointer pairs in a
block.

The total number of blocks we need is

é*4902/ 102 *
ù=*49* blocks.

(d) This question can be answered in two ways depending on whether you
want
a block pointer or a record pointer for each key. If duplicate keys
exists, it is
better to use record pointers instead of block pointers for simplicity in
implementation.

* Solution with block pointers:*

Size of entry = 12 (key) + 8 (pointer) = 20 bytes

ë*2048 / 20)*
û * = 102 * key pointer pairs in a
block.
Number of blocks needed:

é*10000000/ 102 *
ù=*98040* blocks.

* Solution with record pointers:*

Size of entry = 12 (key) + 9 (pointer) = 21 bytes

ë*2048 / 21)*
û * = 97 * key pointer pairs in a
block.
Number of blocks needed:

é*10000000/ 97 *
ù=*103093* blocks.

(e) Since the index blocks are contiguous, we only need a pointer to the
first block and the key values.
We can fit

ë*(2048-8) / 12)*
û * = 170 *
keys in the first index block and

ë*2048 / 12*
û* = 170 *

* Solution with block pointers:*

The total number of block we need is

1+
é*(98040-170)/ 170 *
ù=*577* blocks.

* Solution with record pointers:*

The total number of block we need is

1+
é*(103093-170)/ 170 *
ù=*607* blocks.

2.

3.

**Note 1:** Diagrams by courtesy of Frank Luo.

**Note 2:** Alternate solutions are possible depending on how the keys
are redistributed after
splitting a node. All valid solutions were given full credit.

(a) Root should have at least 2 children. Each child are now leaf and
has at least ë(*n*+1) / 2û record pointers.

Minimum number of record pointers = 2 * ë(*n*+1) / 2û

(b) Again, root has 2 children. Each non-leaf node has at least é(*n*+1) / 2ù pointers. So there are 2 * é(*n*+1) / 2ù leaf nodes. Each leaf node has ë(*n*+1) / 2û record pointers.

Minimum number of record pointers = 2 * é(*n*+1) / 2ù * ë(*n*+1) / 2û

(c) Similar to (b), with *j* levels,

Minimum number of record pointers = 2 * ( é(*n*+1) / 2ù ) * ^{j }*- 2 * ë(

(d) From (c), a B+ tree with *j* levels has at least 2 * ( é(*n*+1) / 2ù ) * ^{j }*- 2 * ë(

*r* ³ 2 * ( é(*n*+1) / 2ù ) * ^{j }*- 2 * ë(

*j* – 2 £ ( log *r* – log 2 – log
( ë(*n*+1) / 2û ) ) / log ( é(*n*+1) / 2ù )

*j *£ 2 + ( log *r* – log 2
– log ( ë(*n*+1) / 2û ) ) / log ( é(*n*+1) / 2ù )

Common errors:

* Think the root is like normal non-leaf nodes and has é(*n*+1) / 2ù pointers, or it can have only 1 pointer to a
child node.

* Calculation error. Especially: log 2 or ln 2 are not 1. Though
log_{2} 2 is, the 1 should not be dropped -- log (a * b) = log a +
log b.

Substituting the appropriate expressions for all the variables, we get

Ignoring

Substituting

Hence, to minimize T(m), we need to minimize

This can be solved to yield an integral value of

- Total seek and latency time
*T*_{a}= t_{s}* ln(N)/ln(m) - Total transfer time
*T*_{b}= 0.02 * m * ln(N)/ln(m) - Total binary search time
*T*_{c}= b * ln(N)/ln(2)