Report Number: CSL-TR-96-688
Institution: Stanford University, Computer Systems Laboratory
Title: OS Support for Improving Data Locality on CC-NUMA Compute
Servers
Author: Verghese, Ben
Author: Devine, Scott
Author: Gupta, Anoop
Author: Rosenblum, Mendel
Date: February 1996
Abstract: The dominant architecture for the next generation of
cache-coherent shared-memory multiprocessors is CC-NUMA
(cache-coherent non-uniform memory architecture). These
machines are attractive as compute servers, because they
provide transparent access to local and remote memory.
However, the access latency to remote memory is 3 - 5 times
the latency to local memory. Given the large remote access
latencies, data locality is potentially the most important
performance issue. In compute-server workloads, when moving
processes between nodes for load balancing, to maintain data
locality the OS needs to do page-migration and
page-replication. Through trace-analysis and actual runs of
realistic workloads, we study the potential improvements in
performance provided by OS supported dynamic migration and
replication. Analyzing our kernel-based implementation of the
policy, we provide a detailed breakdown of the costs and
point out the functions using the most time. We study
alternatives to using full-cache miss information to drive
the policy, and show that sampling of cache misses can be
used to reduce cost without compromising performance, and
that TLB misses are inconsistent as an approximation for
cache misses. Finally, our workload runs show that OS
supported dynamic page-migration and page-replication can
substantially increase performance, as much as 29%, in some
workloads.
http://i.stanford.edu/pub/cstr/reports/csl/tr/96/688/CSL-TR-96-688.pdf