Working sets

Key points:

The Problem: To manage memory effectively to maximize CPU utilization. Select highest degree of multiprogramming that doesn't run into excessive thrashing, and dole out memory to each running process.
Paper includes a history of research in this area. Focuses on the author's Working Set (WS) policy, and argues that it is nearly optimal.
`Working set' at time t is the set of pages which have been referenced at least once since time t-a, for some tuning parameter a. WS policy gives the program just enough memory to contain its `working set', and leaves exactly the pages in the `working set' resident.
Measuring per-process demand for memory: collect traces of a program's memory references, measure performance of the optimal lookahead memory management policy (which unfortunately can't be implemented because it looks into the future!), and use that as an upper bound on the performance possible for any implementable policy.
Theoretical models for program behavior and memory usage: locality, phase-transition models (a program undergoes long phases of very good locality, separated by short transitions of random memory accesses), measurements (2% of time is spent in transitions, 50% of page faults occur in transitions, fault rates 100 to 1000 times higher in transitions than in phases), queuing models, lots of boring theory and formulas.
Implementing optimal memory management: ``L = S criterion'' (heuristic: increase degree of multiprogramming until mean-time-between-page-faults is about equal to mean-time-to-service-a-fault), ``50 percent criterion'' (heuristic: increase degree of multiprogramming until swap device is about 50% utilized; closely related to ``L = S criterion''), global manager (adaptively sets degree of multiprogramming at optimal level suggested by heuristics), local managers (per-process manager monitors a program's `working set' and implements WS), WS empirically keeps space-time products near optimal (better than CLOCK, LRU, PFF; within 5%--30% of optimal lookahead policy).
Hardware device supporting WS: each page has counter and process identifier, a page reference sets counter to 0, clock interrupt increments every counter in the current process, when a page's counter overflows the page is considered removed from the `working set' and ought to be paged out.

Possible criticisms and areas for discussion:

The problem was motivated in the era of batch programming, where there were plenty of asynchronous jobs ready to run. Is this problem-- and controlling degree of multiprogramming-- even relevant anymore? (In other words, does thrashing ever really happen today?-- and does it ever happen because of memory contention from many programs?)
Lots of queuing theory, formal models, derived formulas. Need more performance measurements of real performance for common applications to validate them. Is memory management really a bottleneck?
Considering the main memory as a first-level cache for the swap device, Denning concentrates solely on reducing miss rates by improving block replacement strategies and per-process cache sizes. Interesting to see effect of WS on hit times (or miss times)-- is the extra complexity of managing WS worth the reduced miss rates, or does the complexity become a bottleneck? What about other strategies for reducing access times?