Operating Systems Quals Problems 1992 -- 2002

2001 Problem Set

Dawson Engler

Problem 1 -- Sort-of RAID (30 Points) To help pay Stanford tuition, you buy a computer that has a single flaky disk. Over time you notice one data block per file goes bad.

1. (10 Points) Ignoring system crashes and meta data corruption, explain how to compensate for one bad data block per file using error correction ideas from RAID. Note, however, that for this question, you only have a single disk and you should only need a single extra parity block per file. Recall that one simple version of RAID takes two disks D0 and D1 and creates a third "parity" disk (P) by doing a bitwise xor of all bits on D0 and D1: P = D0 XOR D1. If one disk is lost, the system can then recreate its data given the other two. At a high level, please state (1) what you must do on a disk write (i.e., what blocks you have to modify, write out, etc.) and (2) how to recover when a data or parity block goes bad. You may assume that the disk controller will give you a "bad block" error when you try to read or write a bad block.

2. (5 Points) When modifying cached blocks, you can either recompute the parity block when you evict a block, or each time you are about to modify a cached copy of the block. Under real workloads, why would you expect the latter to be significantly faster.

3. (5 Points) Assume we do not ignore crashes, but do ignore the problem of data blocks going bad during the period of crash to the end of recovery (i.e., the set of bad blocks right before a crash is exactly equal to the set of bad blocks after recovery). What order should you write out disk blocks to eliminate the chance that your file system will be in an unrecoverable state and why? (Recall: unrecoverable means you lose data, or worse, silently get wrong data.) After a crash, what does your recovery program have to do to fix up the disk?

4. (5 Points) For the previous part, you picked one order to write out the data block and parity block. Give an intuition for why the other order would work as well, assuming you did this order consistently.

5. (5 Points) Assume blocks can go bad during the period of crash to recovery. Give (1) a concrete example of a problem this causes and (2) a general statement of why we cannot handle this failure with our current scheme. Problem 2 -- I'm Only Sorry When I am Caught (20 Points) Conservative OS's such as various BSD operating systems require that whenever a physical page is allocated to hold a virtual memory page, that they can also reserve a page size shunk of swap space. Other OS's (such as Linux) use a page allocation policy called "over commit" where they find space in swap on demand, as processes need it.

1. (5 Points) What are advantages of overcommit? As part of your answer, please be concrete about what will happen on a BSD sysstem when there is no more free swap space.

2. (10 Points) The price Linux pays for these benefits is that under high memory load, it kills processes. Why does it have to do this?

3. (5 Points) What heuristics should you use to decide which processes to kill? (i.e., What are good or bad processes to kill?) Note that your decisions are workload dependant, so you'll have to make judgement about what is reasonable. Problem 3 -- Out of the Mouths of Babes (10 Points) As part of your qual preparation, you've read a lot of papers. One obvious fact is that each paper does not blast into new territory with completely new methods. Rather, OS Research follows a small set of patterns (e.g., speed research is a big one) and focuses on a small "cannon" of accepted problems (e.g., making file systems fast). Describe some number of patterns you've observed in reading systems papers as well as passing judgement on what you've observed. Some possible focusing questions (you may ignore these; they are only provided as examples of possible issues to address): what are the dominant accepted problems to work on? Which subfields are stagnating? Which problems of styles of research do you see being relevant in the next couple of decades? What are current fads that you expect to be dead in a few years? What general issues do you expect to see future research focusing on? What is the process of a new problem (e.g., multimedia scheduling) getting accepted by the systems community? What are the disadvantages of problems that can only be evaluated qualitatively rather than quantitatively? Please do not write a "war and peace" essay -- succinct intuitions are fine.

Stanford Operating Systems Quals

2001 Problem Set

Dawson Engler

Maintained by Gurmeet Singh Manku