BIB-VERSION:: CS-TR-v2.0
       ID:: STAN//CS-TR-00-1636
       ENTRY:: September 07, 2000
       ORGANIZATION:: Stanford University, Department of Computer Science
       TITLE:: Design and Analysis of Fast Low Power SRAMs
        TYPE:: Thesis
        TYPE:: Technical Report
        AUTHOR:: Amrutur, Bharadwaj S.
        DATE:: September 2000
        PAGES:: 156
        ABSTRACT:: This thesis explores the design and analysis of Static Random
               Access Memories (SRAMs), focusing on optimizing delay and
               power. The SRAM access path is split into two portions: from
               address input to word line rise (the row decoder) and from
               word line rise to data output (the read data path).
               Techniques to optimize both of these paths are investigated.

               We determine the optimal decoder structure for fast low power
               SRAMs. Optimal decoder implementations result when the
               decoder, excluding the predecoder, is implemented as a binary
               tree. We find that skewed circuit techniques with self
               resetting gates work the best and evaluate some simple sizing
               heuristics for low delay and power. We find that the
               heuristic of using equal fanouts of about 4 per stage works
               well even with interconnect in the decode path, provided the
               interconnect delay is reduced by wire sizing. For fast lower
               power solutions, the heuristic of reducing the sizes of the
               input stage in the higher levels of the decode tree allows
               for good trade-offs between delay and power.

               The key to low power operation in the SRAM data path is to
               reduce the signal swings on the high capacitance nodes like
               the bitlines and the data lines. Clocked voltage sense
               amplifiers are essential for obtaining low sensing power, and
               accurate generation of their sense clock is required for high
               speed operation. We investigate tracking circuits to limit
               bitline and I/O line swings and aid in the generation of the
               sense clock to enable clocked sense amplifiers. The tracking
               circuits essentially use a replica memory cell and a replica
               bitline to track the delay of the memory cell over a wide
               range of process and operating conditions. We present
               experimental results from two different prototypes.

               Finally we look at the scaling trends in the speed and power
               of SRAMs with size and technology and find that the SRAM
               delay scales as the logarithm of its size as long as the
               interconnect delay is negligible. Non-scaling of threshold
               mismatches with process scaling, causes the signal swings in
               the bitlines and data lines also not to scale, leading to an
               increase in the relative delay of an SRAM, across technology
               generations. The wire delay starts becoming important for
               SRAMs beyond the 1Mb generation. Across process shrinks, the
               wire delay becomes worse, and wire redesign has to be done to
               keep the wire delay in the same proportion to the gate delay.
               Hierarchical SRAM structures have enough space over the array
               for using fat wires, and these can be used to control the
               wire delay for 4Mb and smaller designs across process
               shrinks.
       NOTES:: [Adminitrivia V1/Prg/20000907]
         END:: STAN//CS-TR-00-1636