A Fast File System for UNIX McKusick, Joy, Leffler and Fabry
Old File System:
- Simple and elegant, but slow, can only achieved 2% of disk
- Blocks are too small, make large file transfers inefficient.
- Consecutive data blocks of files not close together on disk.
- Free list initially created for optimal access, but as time goes by, it
becomes completely random, causing files to have their data blocks allocated
randomly on disk, which forces a seek on every block access.
- i-nodes far from data. 4MB of inodes followed by 146MB of data, this
requires a long seek to get to the data after accessing the inode.
- i-nodes of directory not close together, which leads to poor
performance on operations such as "ls".
New File System
- Larger block size (4096 or 8192 bytes), so that larger blocks can be
transfer in a single disk transaction, which can greatly increase file system
throughput. Blocks are not made larger because most files on UNIX FS are small
(less than 4K).
- Uses fragments to avoid wasting space in large blocks. E.g. 4K/1K FS
waste same % of space as 1K FS but can transfer 4K blocks.
- Divides disk into groups of consecutive cylinders = cylinder groups.
Each group contains super-block, i-nodes, bitmap of free blocks and usage
summary info. Try to put related data into a cylinder group, and spread out
unrelated data as far as possible.
- The headers of cylinder groups spiral down the platters to prevent loss
10% Reserved Free Space:
- On expansion to a file, FS tries to allocate space that can localize
blocks in the file.
- Tries to allocate new blocks that are rotationally optimal, and on the
same cylinder as the previous block.
- If the number of free blocks goes to zero, the performance is degraded
dramatically, because of the inability to carry out the optimizations outlined
- Conflicting goals: Cluster related data so that a large transfer is
possible, but spread out unrelated data so the disk won't fill up in any one
area. The problem with allocating all the data blocks in the same cylinder
group is that large files will quickly use up available space in the group,
forcing a spill over (current and further allocations) to other areas.
- Switch to another cylinder groups at certain file size boundaries.
- Keep directory within a cylinder group. When allocating for a new
directory, find the cylinder group with the most free indoes.
- Spread out different directories.
- Global policy allocates files and directories to cylinder groups. Picks
"optimal" next block for block allocation according to the goals outlined
- Global policy calls local allocation routines to request for specific
blocks. If cannot be fulfilled, then can choose from a sequence of
alternatives: rotationally closest, on same cylinder group, quadratic hash for
another group, and finally, exhaustive search.
- 20-40% of disk bandwidth for large reads/writes.
- 10-20x original UNIX speeds.
- Writes are always slower than reads, because of the allocation costs in
writes. In the old FS, writes are about 50% faster than reads, because of the
disk cache buffer in which write requests are reordered to allow minimal seek
distances. In the new FS, although writes are already presented to the disk in
optimal order, it takes time to find an "optimal" position for a new block.
- Ignore technology trends? Reads less likely than writes, files grow fast
(leading to many writes).
- Long file names.
- Advisory file locks: only applied when it is requested by a program. This
allows sys admin to participate in the locking mechanisms, because sys admin
can override any protection scheme.
- symbolic links across file systems, intermachines.
- atomic rename capability.
- disk quotas: soft limits issues warning until hard limit is reached.
3 Key Features of paper:
- Parametrize FS implementation for the hardware it is running on. (E.g.
rotational latency, disk bandwidth, etc.).
- Measurement-driven design decisions.
- Locality wins.
- Measurements derived from a single installation.
- Ignored technology trends.