Unix Implementation

Ken Thompson

Overview

provide a kernel with the least common denominator of what users would want the system to do for them
everything outside of the kernel can be changed by users
value simplicity over efficiency

Processes

user processes request system functions by performing a system call, a subroutine call which turns the user processes into system processes
each system process has its own stack, for protection purposes
user processes get their code from shared, read-only text segments; this allows sharing of text pages in memory, as well as obviating the need to swap these pages out (the originals are already on disk)
user processes have a private, read-write data segment, which consists of an automatically managed stack, and a manually managed heap; processes also have an unaccessible (except to the kernel) system data segment, which stores things like file descriptor tables, etc.
a process table has one entry per process, each entry containing, among other things, pointers to the text table (which points to the text segment) and to the data segments
a process is created by having another process do a fork; the new child is an exact copy of the old parent, except for the value of one register, which the two processes can use to differentiate themselves
a parent can wait for the termination of any of its children (rendezvous)
new programs are executed by having a process exec the file containing the program; a new process is not created, but rather, the old process simply begins executing the new program; it will not resume the old program when the new one terminates (like goto)
programs in primary memory are swapped (not paged) to and from primary memory: either the whole program is in primary memory, or none of it is (modulo shared text segments)
the swapper always stays in memory, and decides what processes to swap in and out, depending mostly on how long they have been in/out of primary memory
processes wait on events; other processes that signal these events cause all processes waiting on them to wake up
all processes except one are waiting on an event at any time; when the running process waits, another process is chosen based on a dynamic priority system (system processes have higher priority than user processes; user process priorities dynamically change with the amount of service they receive)

I/O

two kinds of I/O devices: block and character: block devices appear to be files of randomly-accessed 512-byte blocks; character devices can be almost anything else
block I/O is done via a block cache; writes are always asynchronous, unless all dirty blocks are explicitly flushed to disk (which happens periodically, automatically); this can lead to inconsistencies, especially since data may be written to disk in a different order than they were written by the application
character I/O to character-at-a-time devices like terminals and paper tape readers or writers are handled via character queues (FIFOs); terminals also have a special canonical input queue which contains line-at-a-time input, the system having parsed control characters such as erase and kill

File System

each file is an array of bytes, divided into a sequence of 512-byte blocks
the space on the disk is divided into space for inodes and for data blocks; directories are special files that are maintained by the system that map filenames (14 bytes long) to inumbers (the index of an inode in the inode list); a data file may appear in more than one directory, possibly under a different name, simply by listing its inumber in each one (the device number for a disk and the inumber for a file on the disk are sufficient to uniquely identify a file); the root directory is located at a well-known block (2)
free data block numbers are kept in a linked list; each block in the list contains the number of the next block in the list, and up to 50 numbers of free data blocks
each inode describes the location of the data blocks for a file by having a list of 13 block numbers: the first 10 point to the first 10 blocks of the file; the next block points to an indirect block which contains a list of 128 more block numbers; the 12th block can point to a double indirect block, which contains 128 indirect block numbers, each of which points to 128 data blocks; if necessary, the 13th block points to a triple indirect block, which contains 128 double indirect block numbers, each of which points to 128 indirect block numbers, each of which points to 128 data blocks; thus the maximum file size is about 1GB.
user processes reference files by their pathnames; these pathnames are converted to inumbers, and thence to inodes, by traversing the directory tree
each user process can be simultaneously referencing between 10 and 50 files; this is managed by storing pointers to a system-wide open file table in the system data segment of the process; the open file table itself contains I/O pointers and pointers into the active inode table (I/O pointers have to be kept in a separate table, because sometimes two processes which both have the same file open share the I/O pointer, and sometimes they don't)
multiple disk devices can appear to be part of the same directory hierarchy by mounting one filesystem at any leaf of another's hierarchy (note: today, this has changed to any directory, not any leaf); in this way, what appears to be one hierarchy can actually be composed of many physical devices