Report Number: CSL-TR-90-413
Institution: Stanford University, Computer Systems Laboratory
Title: An area-utility model for on-chip memories and its application
Author: Mulder, Johannes M.
Author: Quach, Nhon T.
Author: Flynn, Michael J.
Date: February 1990
Abstract: Utility can be defined as quality per unit of cost. The utility of a particular function in a microprocessor can be defined as its contribution to the overall processor performance per unit of implementation cost. In the case of on-chip data memory (e.g., registers, caches) the performance contribution can be reduced to its effectiveness in reducing memory traffic or in reducing the average time to fetch operands. An important cost measure for on-chip memory is occupied area. On-chip memory performance, however, is expressed much more easily as a function of size (the storage capacity) than as a function of area. Simple models have been proposed for mapping memory size to occupied area. These models, however, are of unproven validity and only apply when comparing relatively large buffers (³ 128 words for caches, ³ 32 words for register sets) of the same structure (e.g., cache versus cache). In this paper we present an area model for on-chip memories. The area model considers the supplied bandwidth of the individual memory cells and includes such overhead as control logic, driver logic, and tag storage, thereby permitting comparison of data buffers of different organizations and of arbitrary sizes. The model gave less than 10% error when verified against real caches and register files. Using this area-utility measure F(Performance,Area), we first investigated the performance of various cache organizations and then compared the performance of register buffers (e.g., register sets, multiple overlapping sets) and on-chip caches. Comparing cache performance as a function of area, rather than size, leads to a significantly different set of organizational tradeoffs. Caches occupy more area per bit than register buffers for sizes of 128 words or less. For data caches, line size is a primary determinant of performance for small sizes while write policy becomes the primary factor for larger caches. For the same area, multiple register sets have poorer performance than a single register set with cache except when the memory access time is very fast (under 3 processor cycles).
http://i.stanford.edu/pub/cstr/reports/csl/tr/90/413/CSL-TR-90-413.pdf