Report Number: CSL-TR-93-561
Institution: Stanford University, Computer Systems Laboratory
Title: Fetch Caches
Author: Bray, Brian K.
Author: Flynn, Michael J.
Date: February 1993
Abstract: For high performance, data caches must have a low miss rate and provide high bandwidth, while maintaining low latency. Larger and more complex set associative caches provide lower miss rates but at the cost of increased latency. Interleaved data caches can improve the available bandwidth, but the improvement is limited by bank conflicts and increased latency due to the switching networks required to distribute cache addresses and to route the data. We propose using a small buffer to reduce the data read latency or improve the read bandwidth of an on-chip data cache. We call the small read-only buffer a fetch cache. The fetch cache attempts to capture the immediate spatial locality of the data read reference stream by utilizing the large number of bits that can be fetched in a single access of an on-chip cache. There are two ways a processor can issue multiple instructions per cache access: the cache access can require multiple cycles (i.e. superpipelined), or multiple instructions are issued per cycle (i.e. superscalar). In the first section, we show the use of fetch caches with multi-cycle per access data caches. When there is a read hit in the fetch cache, the read request can be serviced in one cycle, otherwise the latency is that of the primary data cache. For a four line, 16 byte wide fetch cache, the hit rate ranged from 40 to 60 percent depending on the application. In the second part, we show the use of fetch caches when multi-accesses per cycle are requested. When there is a read hit in the fetch cache, a read can be satisfied by the fetch cache, while the primary cache performs another read or write request. For a four line, 16 byte wide fetch cache, the cache bandwidth increased by 20 to 30 percent depending on the application.
http://i.stanford.edu/pub/cstr/reports/csl/tr/93/561/CSL-TR-93-561.pdf