Working sets
Key points:
- The Problem: To manage memory effectively to maximize CPU utilization.
Select highest degree of multiprogramming that doesn't run into excessive
thrashing, and dole out memory to each running process.
- Paper includes a history of research in this area. Focuses on the author's
Working Set (WS) policy, and argues that it is nearly optimal.
- `Working set' at time t is the set of pages which have been
referenced at least once since time t-a, for some tuning parameter
a. WS policy gives the program just enough memory to contain its
`working set', and leaves exactly the pages in the `working set' resident.
- Measuring per-process demand for memory: collect traces of a program's
memory references, measure performance of the optimal lookahead memory
management policy (which unfortunately can't be implemented because it looks
into the future!), and use that as an upper bound on the performance possible
for any implementable policy.
- Theoretical models for program behavior and memory usage: locality,
phase-transition models (a program undergoes long phases of very good
locality, separated by short transitions of random memory accesses),
measurements (2% of time is spent in transitions, 50% of page faults occur in
transitions, fault rates 100 to 1000 times higher in transitions than in
phases), queuing models, lots of boring theory and formulas.
- Implementing optimal memory management: ``L = S criterion'' (heuristic:
increase degree of multiprogramming until mean-time-between-page-faults is
about equal to mean-time-to-service-a-fault), ``50 percent criterion''
(heuristic: increase degree of multiprogramming until swap device is about 50%
utilized; closely related to ``L = S criterion''), global manager (adaptively
sets degree of multiprogramming at optimal level suggested by heuristics),
local managers (per-process manager monitors a program's `working set' and
implements WS), WS empirically keeps space-time products near optimal (better
than CLOCK, LRU, PFF; within 5%--30% of optimal lookahead policy).
- Hardware device supporting WS: each page has counter and process
identifier, a page reference sets counter to 0, clock interrupt increments
every counter in the current process, when a page's counter overflows the page
is considered removed from the `working set' and ought to be paged out.
Possible criticisms and areas for discussion:
- The problem was motivated in the era of batch programming, where there
were plenty of asynchronous jobs ready to run. Is this problem-- and
controlling degree of multiprogramming-- even relevant anymore? (In other
words, does thrashing ever really happen today?-- and does it ever happen
because of memory contention from many programs?)
- Lots of queuing theory, formal models, derived formulas. Need more
performance measurements of real performance for common applications to
validate them. Is memory management really a bottleneck?
- Considering the main memory as a first-level cache for the swap device,
Denning concentrates solely on reducing miss rates by improving block
replacement strategies and per-process cache sizes. Interesting to see effect
of WS on hit times (or miss times)-- is the extra complexity of managing WS
worth the reduced miss rates, or does the complexity become a bottleneck? What
about other strategies for reducing access times?