Report Number: CSL-TR-93-561
Institution: Stanford University, Computer Systems Laboratory
Title: Fetch Caches
Author: Bray, Brian K.
Author: Flynn, Michael J.
Date: February 1993
Abstract: For high performance, data caches must have a low miss rate
and provide high bandwidth, while maintaining low latency.
Larger and more complex set associative caches provide lower
miss rates but at the cost of increased latency. Interleaved
data caches can improve the available bandwidth, but the
improvement is limited by bank conflicts and increased
latency due to the switching networks required to distribute
cache addresses and to route the data.
We propose using a small buffer to reduce the data read
latency or improve the read bandwidth of an on-chip data
cache. We call the small read-only buffer a fetch cache. The
fetch cache attempts to capture the immediate spatial
locality of the data read reference stream by utilizing the
large number of bits that can be fetched in a single access
of an on-chip cache.
There are two ways a processor can issue multiple
instructions per cache access: the cache access can require
multiple cycles (i.e. superpipelined), or multiple
instructions are issued per cycle (i.e. superscalar). In the
first section, we show the use of fetch caches with
multi-cycle per access data caches. When there is a read hit
in the fetch cache, the read request can be serviced in one
cycle, otherwise the latency is that of the primary data
cache. For a four line, 16 byte wide fetch cache, the hit
rate ranged from 40 to 60 percent depending on the
application. In the second part, we show the use of fetch
caches when multi-accesses per cycle are requested. When
there is a read hit in the fetch cache, a read can be
satisfied by the fetch cache, while the primary cache
performs another read or write request. For a four line, 16
byte wide fetch cache, the cache bandwidth increased by 20 to
30 percent depending on the application.
http://i.stanford.edu/pub/cstr/reports/csl/tr/93/561/CSL-TR-93-561.pdf