2001 Problem Set
Dawson Engler
Problem 1 -- Sort-of RAID (30 Points)
To help pay Stanford tuition, you buy a computer that has a
single flaky disk. Over time you notice one data block per file
goes bad.
1. (10 Points) Ignoring system crashes and meta data
corruption, explain how to compensate for one bad data
block per file using error correction ideas from RAID. Note,
however, that for this question, you only have a single
disk and you should only need a single extra parity
block per file.
Recall that one simple version of RAID takes two disks D0 and D1
and creates a third "parity" disk (P) by doing a bitwise xor of
all bits on D0 and D1: P = D0 XOR D1. If one disk is lost, the
system can then recreate its data given the other two.
At a high level, please state (1) what you must do on a disk
write (i.e., what blocks you have to modify, write out, etc.)
and (2) how to recover when a data or parity block goes bad. You
may assume that the disk controller will give you a "bad block"
error when you try to read or write a bad block.
2. (5 Points) When modifying cached blocks, you can
either recompute the parity block when you evict a block, or
each time you are about to modify a cached copy of the
block. Under real workloads, why would you expect the latter to
be significantly faster.
3. (5 Points) Assume we do not ignore crashes, but do
ignore the problem of data blocks going bad during the period of
crash to the end of recovery (i.e., the set of bad blocks right
before a crash is exactly equal to the set of bad blocks after
recovery). What order should you write out disk blocks to
eliminate the chance that your file system will be in an
unrecoverable state and why? (Recall: unrecoverable means you
lose data, or worse, silently get wrong data.) After a crash,
what does your recovery program have to do to fix up the disk?
4. (5 Points) For the previous part, you picked one
order to write out the data block and parity block. Give an
intuition for why the other order would work as well, assuming
you did this order consistently.
5. (5 Points) Assume blocks can go bad during the
period of crash to recovery. Give (1) a concrete example of a
problem this causes and (2) a general statement of why we cannot
handle this failure with our current scheme.
Problem 2 -- I'm Only Sorry When I am Caught (20 Points)
Conservative OS's such as various BSD operating systems require
that whenever a physical page is allocated to hold a virtual
memory page, that they can also reserve a page size shunk of
swap space. Other OS's (such as Linux) use a page allocation
policy called "over commit" where they find space in swap on
demand, as processes need it.
1. (5 Points) What are advantages of overcommit? As
part of your answer, please be concrete about what will happen
on a BSD sysstem when there is no more free swap space.
2. (10 Points) The price Linux pays for these benefits
is that under high memory load, it kills processes. Why does it
have to do this?
3. (5 Points) What heuristics should you use to decide
which processes to kill? (i.e., What are good or bad processes
to kill?) Note that your decisions are workload dependant, so
you'll have to make judgement about what is reasonable.
Problem 3 -- Out of the Mouths of Babes (10 Points)
As part of your qual preparation, you've read a lot of
papers. One obvious fact is that each paper does not blast into
new territory with completely new methods. Rather, OS Research
follows a small set of patterns (e.g., speed research is a big
one) and focuses on a small "cannon" of accepted problems (e.g.,
making file systems fast).
Describe some number of patterns you've observed in reading
systems papers as well as passing judgement on what you've
observed. Some possible focusing questions (you may ignore
these; they are only provided as examples of possible issues to
address): what are the dominant accepted problems to work on?
Which subfields are stagnating? Which problems of styles of
research do you see being relevant in the next couple of
decades? What are current fads that you expect to be dead in a
few years? What general issues do you expect to see future
research focusing on? What is the process of a new problem
(e.g., multimedia scheduling) getting accepted by the systems
community? What are the disadvantages of problems that can only
be evaluated qualitatively rather than quantitatively?
Please do not write a "war and peace" essay -- succinct
intuitions are fine.