# CS 245 Midterm Exam

## Summer 2000

This exam is open book and open notes. You have 90 minutes to complete the exam. There are five parts in the exam with a total of 80 points. Please read the instructions for each part carefully before starting to work on the questions.

In accordance with both the letter and the spirit of the Honor Code, I have neither given nor received assistance on this examination.

Signed:____________________________________________

 Part Maximum Points I 16 II 10 III 20 IV 14 V 20 Total 80 Bonus 5

Part I. (16 points) Indicate whether each of the following statements is TRUE or FALSE. If you choose FALSE, please correct it or briefly explain why.

1. The LRU algortihm should not be used for the buffer management in B+-tree indexes.

Ans: _____________

2. Since a plain B-tree can store record pointers on the inner nodes, it is more compact than a B+-tree and incurs a lower search cost.

Ans: _____________

3. Split-records is a useful technique for organizing data records with a hybrid format.

Ans: _____________

4. The time for a request to wait to be serviced may be longer using the elevator algorithm than using the first-come-first-served algortihm.

Ans: _____________

5. It is always better to store a sequential file in as few blocks as possible, because that can reduce the disk I/Os when the file is accessed.

Ans: _____________

6. If V(R,C) = 1000 and T(R) = 10000, we can expect the selection on R on the condition C = 20 will yield a relation with with 20 tuples.

Ans: _____________

7. A second level index for an indexed sequential file is usually dense.

Ans: _____________

8. Consider relations R(A,B) and S(B,C) where T(R) = 4000 and T(S) = 2000, and B is a key on S. The expected number of tuples in R natural-join S is less than or equal to 2000.

Ans: _____________

Part II. (10 points) Questions 1-2 below refer to two relations R(A,B,C,D) and S(B,E). R contains 1,000 tuples stored in a sequential file sorted on attribute A.   S contains 20,000 tuples stored in a sequential file sorted on attribue B.   Each block can hold 10 R or S tuples.

1. Which of the following indexes is most appropriate for answering the query R natural-join S, assuming we can always store the index in memory.

(a) A dense index on R.B
(b) A sparse index on R.B
(c) A dense index on S.B
(d) A sparse index on S.B
(e) A sparse index on R.A

Ans: _____________

2. Which query benefits least from a multi-key index on R with A as the first attribute and B as the second attribute.

(a) select * from R where A = 10
(b) select * from R where B = 20
(c) select * from R where A < B
(d) select * from R where A < C
(e) select * from R where C < 100 and A = 10

Ans: _____________

Part III. (20 points) Questions 1-2 below refer to a disk with an actual (formatted) capacity of 8 gigabytes. The disk has 16 surfaces and 1024 tracks The disk rotates at 7200 rpm. The average seek time is 9 ms. The block size is 8KB.

1. Suppose that we are reading a file F that occupies exactly one entire track, and we want to estimate how long it takes to read the whole file sequentially. What key parameter is missing for doing this? Select one from the following choices:

(a) the diameter of the disk - d
(b) the average rotational latency - r
(c) the amount of overhead per track - v
(d) none

Ans: _____________

Based on your choice above, how long does it take to read the file? You may use the symbol (after each choice) to represent the parameter that you think is useful in the estimation result.

Ans: _____________

2. Now, suppose that we want to estimate how long it takes to read a block. What key parameter is missing for doing this? Select one from the following choices:

(a) the diameter of the disk - d
(b) the average rotational latency - r
(c) the amount of overhead per track - v
(d) none

Ans: _____________

Based on your choice above, how long does it take to read a block? You may use the symbol (after each choice) to represent the parameter that you think is useful in the estimation result.

Ans: _____________

Part IV. (14 points)

Consider a relation R(A,B) stored in a hash table with 1024 (210) buckets. Suppose that 20% of the queries on R have the form SELECT * from R where A = a, and 80% of the queries have the form SELECT * from R where B = b where a and b are various constants. Further suppose that a hash function h1 produces a string of nA bits based on the A value, and another hash function h2 produces a string of nB bits based on the B value. The combination of the two strings form the hash key used to index the hash table. What should the values of nA and nB be in order to minimize the average number of buckets that must be accessed to answer a query?

Ans: _____________

Part V. (20 points)

Suppose that we have 8192-byte blocks in which we store records of 614 bytes each. The block header consists of a two-byte counter (indicating the number of entries in the offset table), plus an offset table (or directory), as in Figure 3.8 in the textbook and in slide 52 of Notes 3. In the offe\set table, each record pointer is 2 bytes.

On an average day, three records per block are inserted, and one record per block is deleted. When a record is deleted, a tombstone is left in the offset table. For specificity, assume that the deletion on any day occurs before the insertions.

If a block is initially empty, on what day will there be no more room for the days insertions? (On the first day, day #1, there is no deletion, and three records are inserted.)

Ans: _____________

Bonus. (5 points)

1. (2 points) Consider a B+ tree index of order n over a file with r records. What is the minimum number of nodes that we may have to examine when searching for a unique record using its key value?

Ans: _____________

2. (3 points) Suppose you are designing a file system for a movie database. Each movie has 10 fields that always occur (e.g., title, year) and 20 optional fields that may or may not be relevant or known for a movie (e.g., special efects designer). Each field has a fixed size of 24 bytes. Assume that each optional field is relevant for a particular move with probability p.

You are consider two options for the file records:

1. A fixed-format record
2. A variable-format record where all fields are tagged using a 2-byte tag
For what range of p values is the fixed option better than the variable option according to the expected storage requirement? (hint: you may need to estimate the expected size of a record for each option.)

Ans: _____________