||CS245 PROBLEM SET #2
Due by Wednesday, July 19, 2000
Please keep your answers concise and clear. We may take off points for
extremely lengthy and messy answers.
Problem 1: Conventional Indexes (20 points)
Consider two relations R(A, B) and S(B, C)
stored as sequential files. There are two types of most frequently
asked queries as follows.
Suppose that we are allowed to build conventional dense, sparse, or multi-level
indexes over the relations to optimize the above queries. What indexes shall
we build over R and S to obtain the best query performance?
Briefly specify the following aspects for each index you choose to build.
- select * from R where R.A > a1 and R.A < a2
- select * from R, S where R.B = S.B
- What is the type of the index (dense, sparse, or multi-level)?
- Which attribute is the index built on?
- Is it a primary or a secondary index?
- How should we use the index for the queries?
Problem 2: B+-tree and B-tree (20 points)
Consider a DBMS that has the following characteristics:
We want to build an index on a search key that is 8 bytes long.
Calculate the maximum number of records we can index with
- 2KB fixed-size blocks
- 12-byte pointers
- 56-byte block headers
- a 3-level B+ tree (2 levels plus the root)
- a 3-level B tree
Problem 3: B+-tree Insertions and Deletions (20 points)
Consider the above B+ tree.
Show the B+ tree that results after inserting (in the given order) 56, 50, 75,
87, 48. From that resulting B+ tree we delete (in the given order) 50, 24, 65,
93, 75. What is the resulting final B+ tree?
Problem 4: Extensible and Linear Hashing (20 points)
Consider a dynamic hash structure where buckets can hold up to three records.
Initially the structure is empty. Then we insert the following records, in the
order below, where we indicate the hashed key in parenthesis (in binary):
- Show the extensible hash structure after these records have been inserted.
- Show the linear hash structure after these records have been inserted.
Assume that the threshold value is 2. (i.e., when the average number of keys
per non-overflow bucket is greater than 2, we allocate another bucket).
Problem 5: Multiple-Key Index (20 points)
There is a relation R(X, Y, Z) where the pair of attributes
X and Y together form the key. Suppose that for each
X there are records with 1024 different values of Y,
and for each Y there are records with 1024 different values of
X. We need to answer queries of the form "select Z from R where
C", where C is the selection condition. Suppose we know that
for various constants x and y.
We want to build an index for the relation to improve the average query
performance, and we have the following options:
- C is "X = x" in 20% of the cases
- C is "Y = y" in 30% of the cases
- C is "X = x and Y = y" in 50% of the cases
Both the index and the relation are stored on disk. Suppose that each
X or Y value is 8 bytes long, a record pointer is 8 bytes
long, and a block is 8K bytes. Compute the average query cost (number of block
I/Os) for each index option, and pick the best index.
- Build a multiple-key index with X as the first attribute
- Build a multiple-key index with Y as the first attribute
- Build two secondary indexes on X and Y, using the
"pointer bucket" technique (described in the lecture) to deal with duplicate