Hints for Computer System Design

Lampson

One line summary:

Lampson's hints for system design gathered from his experience with design, implementation and study of many systems.

Hints:

Why it helps in making a good system: functionality, speed and fault tolerance.

Where in the system design it helps: completeness, interfaces and implementation.

Functionality hints:

Do one thing at a time, and do it well: when an interface attempts to do too much, it can become too complicated and slow. An interface should provide the minimal features, and only add specific ones when they won't penalize clients who don't need them. But a seldomly used interface can sacrifice some performance for functionality.
Get it right: avoid the pitfall of thinking abstraction or simplicity is a substitute of getting it right.
Make it fast, rather than general or powerful: client's performance suffers from unwanted features. e.g. RISC vs. CISC
Don't hide power: hide undesirable properties with abstraction, not desirable ones.
Use procedures arguments to provide flexibility in an interface: don't have millions of different functions doing essentially the same thing.
Leave it to the client: An interface can combine simplicity, flexibility and high performance by solving only one problem and leaving the rest to the client. e.g. monitors and wake-up using condition variables. e.g. UNIX streams, pipes etc.
Keep basic interface stable: don't change it if you can otherwise the whole system can break down because almost everything depends on it.
Keep a place to stand: if you do have to change interfaces. e.g. compatibility package which implements the old interface on top of the old.
Plan to throw an implementation away: it probably takes one or more prototype to get the right one. e.g Multics -> UNIX.
Keep secrets of implementation: secrets are assumptions about an implementation that clients are not allowed to make. There are dangers in keeping secrets though, e.g. more assumptions can be made means better performance.
Divide and conquer.
Use a good idea again: e.g. replication.
Handle normal and worst case separately: normal case must be fast; worst case must make some progress. e.g. it is more reasonable to have a system crash when deadlock is detected, rather than sacrifice x% of performance to avoid deadlock.

Speed hints:

Split resources in a fixed way, if in doubt, rather than sharing them. It is usually faster to allocate and access dedicated resources. e.g. co-processors, specialized hardware.
Use static analysis *if you can*: sometimes dynamic analyses are easier to do.
Dynamic translation from a convenient representation to a form that can be quickly interpreted.
Cache answers: try to balance how much you cache vs. how to ensure the valid contents.
Use hints: hints are not always the truth, but they can help speed things up. e.g. routing tables in nodes are hints to where to forward a packet.
Use brute force when in doubt: special purpose HW/computing often is a better solution than complex algorithms.
Compute in background: e.g. write back dirty pages in the background. Try to avoid too much synchronization because it can lead to subtle bugs.
Use batch processing: faster than incremental processing.
Shed load to control demand, and don't let the system get overloaded: e.g. drop packets if you can't handle them.

Fault tolerance:

Designing a system with reliability in mind is not that hard, but adding reliability to an existing system is very difficult.

End-to-end recovery: is absolutely needed, but anything in-between is only needed for improving performance. e.g. check success of transferring data over a network at the end points, checking at the intermediate nodes can reduce the amount of work repeated. Problems with the end-to-end strategy: performance suffers, and it requires a cheap test for success.
Log updates: log your actions so you can recover from crashes.
Make actions atomic or restartable: ones that either completes or have no effect. The crash recover does not need to deal with intermediate states.