I/O Optimization and Disk Architectures: A Survey

Alan Jay Smith

Overview

This paper discusses some optimizations that can be made to I/O systems. Caveat: the assumptions made in this paper are almost 20 years old, and many of them are out of date. Adjust your expectations accordingly.

Why is I/O optimization important?

memory references ~50 ns, disk references ~10 ms: factor of 200000 difference
portions of OS related to I/O are complex, so we want to minimize the frequency and cost of I/O operations

Some I/O optimizations

block size optimization
- small blocks only need small buffers, but more transfers (which are expensive in time)
- use larger blocks (at least 2-4 K)
data set placement
- data sets on the same disk should be close together (commonly used data near the minddle, others to the edges)
- locate data sets used concurrently on different disks, if possible
- faster units should have a load superproportionally higher than slower ones
arm scheduling
- SSTF (shortest seek time first), SCAN, CSCAN
- most disks have only one open file, allocated and accessed sequentially, so arm scheduling is not really necessary (this is nowadays ridiculous)
rotational scheduling
- various methods are mentioned, most of which are not useful today (rotational position sensing, track offset of head switching, folding)
- skip section allocation is still somewhat relevant (cf. FFS)
look ahead fetch and allocation
- disk blocks tend to be read in sequential runs, with run length distribution highly skewed
- prefetch a variable number of blocks ahead, based on the current run length
compaction and fragmentation
- Unix-style (fixed-sized blocks, "randomly" distributed) is terrible, as most files are read and written sequentially
- allocate "extents" for files: large, sequential regions of the disk
- alternately, allocate one block at a time, but preallocate sequential areas when sequential writing is detected (or by user cues)
- using compaction to defragment has undesirable properties, but is in common use
I/O congestion
- provide multiple data paths for data from different disks: don't bottleneck at the string controller, for example
file structure
- use the right data structure (sequential file, B-tree, etc.) for the right job
other optimizations
- spread paging files and other frequently used data sets across devices, controllers, and channels
- very frequently used pages should be fixed in memory

The paper also mentions "current and upcoming" device architecture, including the brand-new fixed-sector disks, with a constant number of constant-sized blocks per track. This allows:

all buffers to be a fixed size
external fragmentation to be eliminated
a guarantee for disk capacity

Also mentioned in the possibility of turning in drums for MOS memory. The paper does not recommend simply expanding main memory, mostly because doing so would require major changes to the OS.