Report Number: CSL-TR-94-627
Institution: Stanford University, Computer Systems Laboratory
Title: An Efficient Shared Memory Layer for Distributed Memory
Author: Scales, Daniel J.
Author: Lam, Monica S.
Date: July 1994
Abstract: This paper describes a system called SAM that simplifies the
task of programming machines with distributed address spaces
by providing a shared name space and dynamic caching of remotely
accessed data. SAM makes it possible to utilize the
computational power available in networks of workstations and
distributed memory machines, while getting the ease of
programming associated with a single address space model. The
global name space and caching are especially important for
complex scientific applications with irregular communication and
parallelism.
SAM is based on the principle of tying synchronization with data
accesses. Precedence constraints are expressed by accesses to
single-assignment values, and mutual exclusion constraints are
represented by access to data items called accumulators.
Programmers easily express the communication and synchronization
between processes using these operations; they can also use
alternate paradigms tyhat are built with the SAM primitives.
Operations for prefetching data and explicitly sending data to
another processor integrate cleanly with SAM's shared memory
model and allow the user to obtain the efficiency of message
passing when necessary.
We have built implementations of SAM for the CM-5, the Intel
iPSC/860, the Intel Paragon, the IBM SP1, and heterogeneous
networks of Sun, SGI, and DEC workstations (using PVM). In this
report, we describe the basic functionality provided by SAM,
discuss our experience in using it to program a variety of
scientific applications and distributed data structures, and
provide performance results for these complex applications on a
range of machines. Our experience indicates that SAM
significantly simplifies the programming of these parallel
systems, supports the necessary functionality for developing
efficient implementations of sophisticated applications, and
provides portability across a range of distributed memory
environments.
http://i.stanford.edu/pub/cstr/reports/csl/tr/94/627/CSL-TR-94-627.pdf