DB Seminar (CS 545) talk

Next Generation Scalable Shared Storage Architectures

Garth Gibson
School of Computer Science, Carnegie Mellon University (on leave as CTO of Panasas Inc)

Abstract

Scalability in storage starts with incremental and unbounded growth of storage capacity and the corresponding explosion of large and richly typed data sets. But scalability calls for much more than capacity from next-generation network storage. To be scalable, aggregate and per-application gigabits per second and accesses per second must all scale with capacity growth. To achieve this, datapath and controlpath bottlenecks must be eliminated and parallelism increased along all paths. But scalability is also about what must not scale with capacity: administrator management time, backup window, downtime, and the probability of data loss. And, of course, the cost per terabyte of capacity must not scale with more installed terabytes. To do this, intelligence that understands more of what is being stored must be distributed throughout client clusters and storage arrays. Just what that intelligence allows storage to cost-effectively do for data management applications is just beginning to be explored in industrial R&D today. This talk will tour these emerging trends in next generation scalable shared storage architectures.