Report Number: CSL-TR-94-604
Institution: Stanford University, Department of Computer Science
Title: Integrating multiple communication paradigms in high
performance multiprocessors
Author: Heinlein, John
Author: Gharachorloo, Kourosh
Author: Gupta, Anoop
Date: February 1994
Abstract: In the design of FLASH, the successor to the Stanford DASH
multiprocessor, we are exploring architectural mechanisms for
efficiently supporting both the shared memory and message
passing communication models in a single system. The unique
feature in the FLASH (FLexible Architecture for SHared
memory) system is the use of a programmable controller at
each node that replaces the functionality of hardwired cache
coherence state machines in systems like DASH. The base
coherence protocol is supported by executing appropriate
software handlers on the programmable controller to service
memory and coherence operations. The same programmable
controller is also used to support message passing. This
approach is attractive because of the flexibility software
provides for implementing different coherence and message
passing protocols, and because of the simplification in
system design and debugging that arises from the shift of
complexity from hardware to software.
This paper focuses on the use of the programmable controller
to support message passing. Our goal is to provide message
passing performance that is comparable to an aggressive
hardware implementation dedicated to this task. In FLASH,
message data is transferred as a sequence of cache line sized
units, thus exploiting the datapath support already present
for cache coherence. In addition, we avoid costly interrupts
to the main processor by having the programmable engine
handle the control for message transfers. Furthermore, in
contrast to most earlier work, we provide an integrated
solution that handles the interaction of message data with
virtual memory, protected multiprogramming, and cache
coherence. Our preliminary performance studies indicate that
this system can sustain message transfers at a rate of
several hundred megabytes per second, efficiently utilizing
the available network bandwidth.
http://i.stanford.edu/pub/cstr/reports/csl/tr/94/604/CSL-TR-94-604.pdf