Stanford Distributed Systems and Networks Quals

2002 2001 2000 1999 1998 1997 1996 1995

2002 Problem Set

Mary Baker

The objective of this exam is to find out what you know about distributed systems and networks and to assess your ability to identify and develop solutions for problems that arise in this area. Note that the questions may have more than one "correct" answer, so be sure to provide justifications for your answers. State any assumptions you make when answering the questions. The points in parentheses are a rough indication of how many minutes to spend on each question. The exam is closed-book. Brief, "bullet-style" answers are welcome. Problem 1 (15 Points) What are the advantages and disadvantages of leases [Gray & Cheriton, SOSP, 1989] with respect to callbacks, as used in AFS and Coda? Consider three cases:
  1. normal running of the system
  2. client disconnection from the system, and
  3. server failure and recovery.
Problem 2 (15 Points) Consider a TCP connection over which data is flowing from a sender to a receiver.
  1. Can a low-bandwidth reverse path from receiver to sender affact TCP performance in the forward direction? If so, how? If not, explain why not.
  2. Some TCP implementations allow the receiver to send cumulative acknowledgements, meaning the receiver only sends an ack after every n packets with n > 1. Why could this be a good thing? What could happen if n is very large?
Problem 3 (15 Points) You've been asked to implement an RPC library and are faced with the decision of designing a customized protocol over either UDP or TCP. What are the issues with each of these approaches? Problem 4 (15 Points) You've designed a communication substrate hoping it will allow you to provide reliable delivery of messages to at least nf nodes where n ≥ 3f + 1. By reliable, in this case, we mean that the sender knows the message will be received by at least that many nodes and that the payload received will be identical to what the sender sent.

Let's say you've implemented this protocol in the network drivers of your system and have verified that it works through debugging information in the network drivers. You now run a distributed fault-tolerant application on top of this system and find that, unfortunately, application messages are not reliably received by at least nf instances of the application in the face of certain failures or attacks. Describe how this could happen. Give an example.

Maintained by Gurmeet Singh Manku