Unix Implementation
Ken Thompson
Overview
- provide a kernel with the least common denominator of what users would
want the system to do for them
- everything outside of the kernel can be changed by users
- value simplicity over efficiency
Processes
- user processes request system functions by performing a system
call, a subroutine call which turns the user processes into system
processes
- each system process has its own stack, for protection purposes
- user processes get their code from shared, read-only text
segments; this allows sharing of text pages in memory, as well as
obviating the need to swap these pages out (the originals are already on disk)
- user processes have a private, read-write data segment, which
consists of an automatically managed stack, and a manually managed heap;
processes also have an unaccessible (except to the kernel) system data
segment, which stores things like file descriptor tables, etc.
- a process table has one entry per process, each entry containing,
among other things, pointers to the text table (which points to the text
segment) and to the data segments
- a process is created by having another process do a fork; the new
child is an exact copy of the old parent, except for the
value of one register, which the two processes can use to differentiate
themselves
- a parent can wait for the termination of any of its children
(rendezvous)
- new programs are executed by having a process exec the file
containing the program; a new process is not created, but rather, the old
process simply begins executing the new program; it will not resume the old
program when the new one terminates (like goto)
- programs in primary memory are swapped (not paged) to and from
primary memory: either the whole program is in primary memory, or none of it
is (modulo shared text segments)
- the swapper always stays in memory, and decides what processes to
swap in and out, depending mostly on how long they have been in/out of primary
memory
- processes wait on events; other processes that signal these
events cause all processes waiting on them to wake up
- all processes except one are waiting on an event at any time; when the
running process waits, another process is chosen based on a dynamic priority
system (system processes have higher priority than user processes; user
process priorities dynamically change with the amount of service they receive)
I/O
- two kinds of I/O devices: block and character: block
devices appear to be files of randomly-accessed 512-byte blocks; character
devices can be almost anything else
- block I/O is done via a block cache; writes are always asynchronous,
unless all dirty blocks are explicitly flushed to disk (which happens
periodically, automatically); this can lead to inconsistencies, especially
since data may be written to disk in a different order than they were written
by the application
- character I/O to character-at-a-time devices like terminals and paper tape
readers or writers are handled via character queues (FIFOs); terminals also
have a special canonical input queue which contains line-at-a-time
input, the system having parsed control characters such as erase and kill
File System
- each file is an array of bytes, divided into a sequence of 512-byte blocks
- the space on the disk is divided into space for inodes and for
data blocks; directories are special files that are
maintained by the system that map filenames (14 bytes long) to inumbers (the
index of an inode in the inode list); a data file may appear in more than one
directory, possibly under a different name, simply by listing its inumber in
each one (the device number for a disk and the inumber for a file on the disk
are sufficient to uniquely identify a file); the root directory is
located at a well-known block (2)
- free data block numbers are kept in a linked list; each block in the list
contains the number of the next block in the list, and up to 50 numbers of
free data blocks
- each inode describes the location of the data blocks for a file by having
a list of 13 block numbers: the first 10 point to the first 10 blocks of the
file; the next block points to an indirect block which contains a
list of 128 more block numbers; the 12th block can point to a double
indirect block, which contains 128 indirect block numbers, each of which
points to 128 data blocks; if necessary, the 13th block points to a triple
indirect block, which contains 128 double indirect block numbers, each of
which points to 128 indirect block numbers, each of which points to 128 data
blocks; thus the maximum file size is about 1GB.
- user processes reference files by their pathnames; these pathnames are
converted to inumbers, and thence to inodes, by traversing the directory tree
- each user process can be simultaneously referencing between 10 and 50
files; this is managed by storing pointers to a system-wide open file
table in the system data segment of the process; the open file table
itself contains I/O pointers and pointers into the active inode table
(I/O pointers have to be kept in a separate table, because sometimes two
processes which both have the same file open share the I/O pointer,
and sometimes they don't)
- multiple disk devices can appear to be part of the same directory
hierarchy by mounting one filesystem at any leaf of another's
hierarchy (note: today, this has changed to any directory, not any
leaf); in this way, what appears to be one hierarchy can actually be
composed of many physical devices