BIB-VERSION:: CS-TR-v2.0 ID:: STAN//CS-TR-00-1636 ENTRY:: September 07, 2000 ORGANIZATION:: Stanford University, Department of Computer Science TITLE:: Design and Analysis of Fast Low Power SRAMs TYPE:: Thesis TYPE:: Technical Report AUTHOR:: Amrutur, Bharadwaj S. DATE:: September 2000 PAGES:: 156 ABSTRACT:: This thesis explores the design and analysis of Static Random Access Memories (SRAMs), focusing on optimizing delay and power. The SRAM access path is split into two portions: from address input to word line rise (the row decoder) and from word line rise to data output (the read data path). Techniques to optimize both of these paths are investigated. We determine the optimal decoder structure for fast low power SRAMs. Optimal decoder implementations result when the decoder, excluding the predecoder, is implemented as a binary tree. We find that skewed circuit techniques with self resetting gates work the best and evaluate some simple sizing heuristics for low delay and power. We find that the heuristic of using equal fanouts of about 4 per stage works well even with interconnect in the decode path, provided the interconnect delay is reduced by wire sizing. For fast lower power solutions, the heuristic of reducing the sizes of the input stage in the higher levels of the decode tree allows for good trade-offs between delay and power. The key to low power operation in the SRAM data path is to reduce the signal swings on the high capacitance nodes like the bitlines and the data lines. Clocked voltage sense amplifiers are essential for obtaining low sensing power, and accurate generation of their sense clock is required for high speed operation. We investigate tracking circuits to limit bitline and I/O line swings and aid in the generation of the sense clock to enable clocked sense amplifiers. The tracking circuits essentially use a replica memory cell and a replica bitline to track the delay of the memory cell over a wide range of process and operating conditions. We present experimental results from two different prototypes. Finally we look at the scaling trends in the speed and power of SRAMs with size and technology and find that the SRAM delay scales as the logarithm of its size as long as the interconnect delay is negligible. Non-scaling of threshold mismatches with process scaling, causes the signal swings in the bitlines and data lines also not to scale, leading to an increase in the relative delay of an SRAM, across technology generations. The wire delay starts becoming important for SRAMs beyond the 1Mb generation. Across process shrinks, the wire delay becomes worse, and wire redesign has to be done to keep the wire delay in the same proportion to the gate delay. Hierarchical SRAM structures have enough space over the array for using fat wires, and these can be used to control the wire delay for 4Mb and smaller designs across process shrinks. NOTES:: [Adminitrivia V1/Prg/20000907] END:: STAN//CS-TR-00-1636