Report Number: CSL-TR-95-670
Institution: Stanford University, Computer Systems Laboratory
Title: Design and Analysis of Update-Based Cache Coherence Protocols
for Scalable Shared-Memory Multiprocessors
Author: Glasco, David Brian
Date: June 1995
Abstract: This dissertation examines the performance difference between
invalidate-based and update-based cache coherence protocols
for scalable shared-memory multiprocessors. The first portion
of the dissertation reviews cache coherence. First, chapter 1
describes the cache coherence problem and identifies the two
classes of cache coherence protocols, invalidate-based and
update-based. The chapter also reviews bus-based protocols
and reviews the additional requirements placed on the
protocols to extend them to scalable systems. Next, chapter 2
reviews two latency tolerating techniques, relaxed memory
consistency models and software-controlled data prefetch, and
examines their impact on the cache coherence protocols.
Finally, chapter 3 reviews the details of three
invalidate-based protocols defined in the literature and
defines two new update-based protocols.
The second portion of this dissertation examines the
performance differences between invalidate-based and
update-based protocols. First, chapter 4 presents the
methodology used to examine the performance of the protocols.
This presentation includes a discussion of the simulation
environment, the simulated architecture and the scientific
applications. Next, chapter 5 describes and analyzes the
performance of two enhancements to the update-based cache
coherence protocols. The first enhancement, a fine-grain or
word based synchronization scheme, combines data
synchronization with the data. This allows the system to take
advantage of the fine-grain data updates which result from
the update-based protocols. The second enhancement, a write
grouping scheme, is necessary to reduce the network traffic
generated by the update-based protocols. Next, chapter 6
presents and discusses the simulated results that demonstrate
that update-based protocols, with the two enhancements, can
significantly improve the performance of the fine-grain
scientific applications examined compared to invalidate-based
protocols. Chapter 7 examines the sensitivity of the
protocols to changes in the architectural parameters and to
migratory data. Finally chapter 8 discusses how the choice of
protocols affect the correctness, cost and efficiency of the
cache coherence mechanism.
Overall, this work demonstrates that update-based protocols
can be used not only as a coherence mechanism, but also as a
latency reducing and tolerating technique to improve the
performance of a set of fine-grain scientific applications.
But as with other latency reducing techniques, such as data
prefetch, the technique must be used with an understanding of
its consequences.
http://i.stanford.edu/pub/cstr/reports/csl/tr/95/670/CSL-TR-95-670.pdf