Stanford Distributed Systems and Networks Quals

2002 2001 2000 1999 1998 1997 1996 1995

1996 Problem Set

The objective of the exam is to find out what you know about distributed systems and networks and to assess your ability to identify and develop solutions for problems that arise in this area. Note that the questions may have many "correct" answer, so be sure to provide justifications for your answers. State any assumptions you make when answering the questions. The points in parentheses are a rough indication of how many minutes to spend on each question. The exam is closed-book. Problem 1 (30 Points) Let's say you've been given a bunch of wireless network devices. The wireless devices communicate via radio frequencies. The radios communicate peer-to-peer only. Multiple pairs of radios near each other can communicate simultaneously without much interference, since each conversation may use a different frequency. The radios synchronize with each other by scanning frequencies listening for beacons from other radios that contain information about what frequencies to use to communicate with them. There is no shared broadcast frequency, so only one radio at a time can hear another radio. The frequency a radio uses is not fixed, and in fact it changes periodically to avoid hogging a particular frequency for too long or to avoid interference on a frequency already in use. Radios send data to each other in the form of packets, and each radio packet has a header that specifies the source and destination physical network addresses. A radio will ignore an incoming packet that doesn't have its physical address as the destination.

Some researchers have already written device drivers for the radios so that you can run TCP/IP over them in a manner similar to running TCP/IP over an Ethernet subnet. However, they have been unable to implement ARP and need your help. ARP (the Address Resolution Protocol) provides a mechanism for a host to figure out another host's physical network address given its Internet (IP) address. This is done via a broadcast on an Ethernet subnet. The broadcasted request packet contains the IP address. The response contains the appropriate physical network address and can be sent directly back (unicast) to the requesting host. Generally the host whose IP address is provided is the one to answer the request, but you can change this if you'd like.

PART A: Without broadcast ability, how would you implement ARP over this radio network? Explain the motivation behind your design choices. You can assume the following things:
  1. Each host knows its own IP address and the physical address of its radio interface.
  2. Each host is able to obtain from its radio a list of the hardware addresses of other radios that are within hearing range. It cannot by itself obtain addresses that are out of its range.
  3. Hosts can forward packets to other hosts.
  4. All the radios are on the same IP subnet.
  5. You can't change how the radios operate.
PART B: How would you implement ARP if a host were not able to obtain a list of the hardware addresses of its neighbors? For this question assume that all radios have the hardware address of a particular host, and that all of them are within range of this particular host. Explain the motivation behind your design.

Problem 2 (25 Points) Consider the following two distributed file systems. Each has a server and some workstation clients of the file server. In both systems, clients may cache file data on a block-by-block basis in their main memories.

Cache coherency for file system A:

If a client wants to read data for a file and it does not have the data cached, then it must execute a read operation from the server. If the data is cached, then it may or may not have to reread the data from the server, as follows. The client will check when it last obtained the file's "last-modified date" from the server. If it has been over 30 seconds since the client obtained this information, then it re-requests it from the server. If the file's last-modified date is more recent than when the client last refreshed its cached data for the file, then the client invalidates that data in its cache and rereads the data from the file server. (The client will also have to check the last-modified data if it wants to overwrite only part of the data it has cached for a file, since it may need to re-read the cached data before modifying it.)

Any data the client writes to a file is cached but is simultaneously written through to the server's disk. The write does not complete until the data has safely reached the server's disk. The server updates the file's last modified data when it does the write. The server is stateless, in that it does not keep track of any information about which clients are caching which files.

Cache coherency for file system B:

The server is stateful, in that it does keep track of information about which clients are caching which files. The server implements "multiple readers or single writer" cache coherency amongst clients using tokens, as follows. A client with a write token is allowed to read and write a file in its cache. A client with a read token is allowed to read a file from its cache. A client without a token must obtain the correct type of token from the server before it may cache a file. If a client wishes to read a block of a file, and the file is not in its cache, the client makes a read request of the server. The server checks if some client has a write token for the file. If so, it revokes the client's write token and asks that client to flush back any modified data it has cached for the file. The server can then respond to the client asking to read the file. It gives the reading client a read token and sends it the data it desires. The client may continue to read from the data it has cached for the file for as long as it has the read token for that file.

If a client asks to write a file, and it does not have a write token, it makes a write request of the server. As above, the server then checks to see if any client has a write token. If so it asks that client to flush back its modified data and revokes its write token. The server also revokes any read tokens for the file. Clients whose read or write tokens are revoked must invalidate their cached data for the file. The client receiving the write token can then read and write the file in its cache for as long as it has the write token. Data it modifies is not written from its cache unless the server revokes its write token or unless the client runs out of room in its cache and needs to write data back to the server before removing it from the cache. The server permits multiple read tokens to exist for a single file. However, only one write token for a file may exist. While a write token exists, then no read token may exist for that file.

Compare the following aspects of the two file systems:
  1. The amount of cache coherency provided: Will clients always see the most recent data that has been written for a file someplace in the system, or is it possible for a client to read stale data?
  2. Performance: How do the systems compare in terms of latency of file I/O operations? Consider the amount of write-sharing of files between clients in the workload.
  3. Crash recovery overhead: What must the server do when recovering from a crash in each of the systems to continue providing whatever level of cache coherency is provided under normal operation?

Problem 3 (10 Points) This question looks at what layer of a system is appropriate for performing certain tasks. For each of the following cases, please comment on whether you feel the work is being performed at the appropriate level of the system and why. If you think the answer is "it depends," then explain what it depends upon and why.

Case 1: You've got a cellular telephone and you'd like to protect your top-secret calls form eavesdroppers. The cellular telephone has an encrypt button that encrypts data over the link radio link from the phone to the base station.

Case 2: You're using some radio network devices that may lose lots packets depending on the amount of radio interference in the area. To handle this, a radio acknowledges every packet it receives. If the sending radio doesn't receive an acknowledgement, it re-sends the packet.

Case 3: You're using the TCP protocol to transfer large files. TCP guarantees reliable, sequential packet delivery. If packets are lost due to congestion, the protocol will retransmit them. Packets have sequence numbers so that they can be ordered. The application you're using also has a file checksum that it transfers with the file, so that the destination application can compare this checksum with the file it receives. If the checksums don't match, the file may be corrupted and the application can ask for the file to be retransmitted.

Maintained by Gurmeet Singh Manku