Management of Space in Hierarchical Storage Systems

Shahram Ghandeharizadeh
Computer Science Department
University of Southern California

The past decade has witnessed a proliferation of repositories whose workload consists of queries that retrieve information. These repositories provide on-line access to vast amount of data and serve as an integral component of many applications, e.g., library information systems, scientific applications, and the entertainment industry. Their storage subsystems are expected to be hierarchical, consisting of memory, magnetic disk drives, optical disk drives, and tape libraries. The database itself resides permanently on the tape. Objects are swapped onto either the magnetic or optical disk drives on demand, and later deleted when the available space of a device is exhausted. This behavior will generally cause fragmentation of the disk space over a period of time, resulting in a non-contiguous layout of disk-resident objects. As a consequence, the disk is required to reposition its read head multiple times (incurring seek operations) whenever a resident object is retrieved. This may reduce the overall performance of the system.

This presentation describes four alternative techniques to manage the available space of mechanical devices in such hierarchical storage systems. Conceptually, these techniques can be categorized according to how they optimize several quantities, including: 1) the fragmentation of disk-resident objects, 2) the amount of wasted space, and 3) adaptation to the evolving access pattern of an application. We identify these factors and demonstrate their impact using a simulation study.