I/O Optimization and Disk Architectures: A Survey
Alan Jay Smith
Overview
This paper discusses some optimizations that can be made to I/O
systems. Caveat: the assumptions made in this paper are almost 20 years
old, and many of them are out of date. Adjust your expectations accordingly.
Why is I/O optimization important?
- memory references ~50 ns, disk references ~10 ms: factor of 200000
difference
- portions of OS related to I/O are complex, so we want to minimize the
frequency and cost of I/O operations
Some I/O optimizations
- block size optimization
- small blocks only need small buffers, but more transfers (which are
expensive in time)
- use larger blocks (at least 2-4 K)
- data set placement
- data sets on the same disk should be close together (commonly used data
near the minddle, others to the edges)
- locate data sets used concurrently on different disks, if possible
- faster units should have a load superproportionally higher than slower
ones
- arm scheduling
- SSTF (shortest seek time first), SCAN, CSCAN
- most disks have only one open file, allocated and accessed sequentially,
so arm scheduling is not really necessary (this is nowadays ridiculous)
- rotational scheduling
- various methods are mentioned, most of which are not useful today
(rotational position sensing, track offset of head switching, folding)
- skip section allocation is still somewhat relevant (cf. FFS)
- look ahead fetch and allocation
- disk blocks tend to be read in sequential runs, with run length
distribution highly skewed
- prefetch a variable number of blocks ahead, based on the current run
length
- compaction and fragmentation
- Unix-style (fixed-sized blocks, "randomly" distributed) is terrible, as
most files are read and written sequentially
- allocate "extents" for files: large, sequential regions of the disk
- alternately, allocate one block at a time, but preallocate sequential
areas when sequential writing is detected (or by user cues)
- using compaction to defragment has undesirable properties, but is in
common use
- I/O congestion
- provide multiple data paths for data from different disks: don't
bottleneck at the string controller, for example
- file structure
- use the right data structure (sequential file, B-tree, etc.) for the
right job
- other optimizations
- spread paging files and other frequently used data sets across devices,
controllers, and channels
- very frequently used pages should be fixed in memory
The paper also mentions "current and upcoming" device architecture, including
the brand-new fixed-sector disks, with a constant number of constant-sized
blocks per track. This allows:
- all buffers to be a fixed size
- external fragmentation to be eliminated
- a guarantee for disk capacity
Also mentioned in the possibility of turning in drums for MOS memory. The
paper does not recommend simply expanding main memory, mostly because doing so
would require major changes to the OS.