Preface
© copyright 1996 by Gio Wiederhold.
Preface
Introduction
This book presents the concepts for design and implementation
of modern databases.
An analytical approach is followed so that the concepts learned can
be applied not only to the wide variety of databases that exist today but
can also be used build systems for the future.
This textbook differs from others in its field by its emphasis on
design and the formalization of the semantics that determine design
decisions.
While anyone who has read about databases will have encountered
the various conceptual models, as relational, hierarchical,
networks, entity-relationships, etc., they are typically placed in
opposition of each other rather than in a continuum of
implementations of semantics.
Similarly, the mathematical aspects of concepts such as functional
dependencies are stressed without providing much interaction on
their meaning.
Once we have a formal understanding of how the structure of a
database is related to the semantics of the stored information, then
we can consider the transactions that are being performed as
transformations of the represented knowledge.
The users are an integral component of such systems.
The results they obtain are combinations of the stored data and
the transactions they specify.
Transactions that update the database incorporate such knowledge in
the system and make it available to others.
Scope
We present databases in a setting of transaction processing.
Transaction processing introduces modularity into database usage.
We also present the implementation of database management in
a modular fashion.
Modularity is beneficial in presentation as well as in actual
system design.
The analytical approach permits that the performance of database
transactions can be evaluated, both for centralized and distributed systems.
Some background from the predecessor volume, \nw{File Organization for Database
Design}, will be helpful to reduce the conceptual analysis shown here to
practical design details.
The emphasis throughout is on concepts and structure.
No actual database management systems are described completely, nor are
systems surveyed and compared.
Specific and realistic examples are used throughout to illustrate points
being made.
Maintenance
This text distinguishes itself in that it considers the maintenance of
the database throughout. The cost and effort to maintain databases is
high, and a database system cannot provide useful information unless
it is well maintained. WE also assume throughout that the database
may be distributed over multiple computers.
A unique aspect of this text is found in Chapter~7 where a
specifications for the modules of a database management system is
given. This material conveys both an insight why a DBMS is complex,
and how it may be organized to make the complexity manageable and maintainable.
Our Outlook on Databases
In this book we present a constructive outlook on the issues of database
systems.
We do not try to define some models, typically the relational,
hierarchical, and network types, and then set those models
off against each other.
Rather, we view the choices implied by these models as rational
choices for specific situations.
We investigates the tools for making intelligent choices, and use a
knowledge-based design methodology to control the process.
Now these models can be seen as alternatives in binding strategy, namely how
the knowledge about the applications for the databases is embodied in the
database systems to be constructed.
This outlook also means that modeling concepts are carefully
distinguished from implementation concepts.
In particular, although we emphasize design concepts based on relational
models, augmented with a formal connection concept among relations, we are
able to use these design concepts for network implementations as well.
Network concepts are then no longer tied to language specifications made 20
years ago, but can cover a range of implementations, past and future.
We believe that this outlook assures long-term validity of the
material in this text.
Organization
We follow a bottom-up approach in presentation, so that
each level of abstraction is thoroughly grounded in earlier material.
The layout of this book is presented graphically in Chap.~1, as Fig.~1-<6>,
after the introductory definitions have been completed.
Foundation material on software and hardware is presented in Chap.~2.
In Chaps.~3, <<4, 5, and 6 the fundamental model structures are presented and
analyzed.
Implementation structures are ... in Chaps.~7, 8, and 9.
Performance of entire transactions is the topic of Chap.~10, and the
alternative approaches for distributed files are found in Chap.~14.
The remaining chapters present complementary material: Chap.~12 details
techniques appropriate for advanced performance analysis and Chap.~13 considers
data representation.
The closing chapter also provides a link to database technology.
Curricula
Modern curricula give increased emphasis to databases.
This book presents all the material recommended for the CS-5
(Introduction to File Processing) as specified in the
ACM Curriculum on Computer Science. %[\ref[Austing et al:78]].
The quantitative approach taken in this book causes
algorithmic and analytic material assigned to courses CS-6 and CS-7 to be
included as well, albeit limited by relevance to files.
Courses oriented towards data-processing require treatment of databases
as well, although analytical issues may not be emphasized.
It is clear that a graduate with a Computer Science or Computer
Engineering degree should be competent to deal with the concepts
encountered when dealing with databases.
Origin
Some of the material in this book derives from an earlier
book on Database Design.
Classes based on this book and earlier notes have been given at Stanford
University as File and Database Structures since 1971.
During this period the field has matured, and we find some
stability of fundamental concepts, although advanced applications
continue to press for advances in many specific areas.
The related book, File Organization For Database Design,
stresses performance issues and analysis.
We continue to follow an engineering attitude towards system building.
Audience
The audience for this book ranges from students of computing
who have finished a reasonably complete course in
programming to applications and systems programmers who
wish to synthesize their experiences into a more formal structure.
Some background in file structures is necessary.
The material covered should be known by systems designers or systems
analysts faced with database implementation choices.
It probably presents too much detail to be of interest to management
outside the systems and database management area itself.
Examples and Exercises
The program examples throughout the text use a simple subset
of {\csc Ada\footnote{\UPSTAR}{{\ninepoint{\csc Ada} is a trademark
of XXX}}}.
The programs are designed to be obvious to readers familiar with any
procedure-oriented programming language.
The introductory examples are annotated to help the reader.
Many of the examples illustrate features of actual systems and applications,
but are of necessity incomplete.
An effort has been made to note simplifying assumptions.
The same should be done in students' design assignments, so that
awareness of real-world complexities is fostered without
overwhelming the design with trivia.
The exercises listed in each chapter have been kept relatively simple.
Some of them were inspired by an {\csc acm} {\sl Self-assessment Procedure}
(\ref[Solomon:86]).
It is suggested that an analysis of your local file system
be made part of some of the assignments,
as indicated in several of the problem statements.
The analysis or comparison of actual systems may seem to be an
excessively complex task, but has been shown to be manageable by
students when the material of this book has been assimilated.
Appendix~A provides references to a number of database systems.
The primary exercise when this course is being taught at Stanford
is a design project.
Early in the course students prepare an objective statement
for a database application of interest to them.
Some individual research may be needed to obtain estimates of expected data
quantities and transaction load frequencies.
Exercises related to this project appear throughout the text and are
labeled with a superscript $\sp{p}$.
References
Source material for this book came from many places and experiences.
References are not cited throughout the text since the intent is to
produce a book which integrates the concepts and ideas.
An extensive background section at the end of every chapter cites
the major sources used and indicates further study material.
The references provide a generous foothold for students
intending to pursue a specific topic in depth.
The references can also direct research effort toward the many yet
unsolved problems in the area.
The bibliography has been selected to include some important
material for each of the subject areas introduced.
Trade publications, research reports, theses, and computer manuals
are referenced only when used directly,
although much relevant information can be found there.
Up-to-date information on computer and software systems is best obtained
from manufacturers.
I apologize to the authors of work I failed to reference, either
due to application of these rules, or because of lack of awareness on my part.
A large, annotated, bibliography is being maintained by me and
is available.
I prefer to distribute the bibliography in computer-readable form
since it is too large to be effectively scanned without computer assistance.
Acknowledgments
I have to refer to the acknowledgments in the predecessor volume
( Database Design, 1st and 2nd editions) for the many colleagues and students who
have helped with the material as it developed.
Significant assistance for this book was provided by
Jim Gray and XiaoLei Qian.
Material from theses by Dr.~Robert Blum, Ramez ElMasri, Sheldon
Finkelstein, Arthur M.~Keller, Jonathan King, Toshi Minoura, David
Shaw, and Kyu-Young Whang has affected the contents of this book.
Concepts from papers written with Murray Berkowitz, Bill Brykczynski,
Stefano Ceri, Fred~Friedman, Sham Navathe,
Xiao-Lei Qian, Domenico Sacc\'a %, John Salasin, and David L.\ Spooner
have influenced this text as well.
Research support for much of this work came from the Defense Advanced
Research Projects Agency (contract N39-84-C211) for
Knowledge Based Management Systems, and
applications of these concepts to health care were supported by
the National Center for Health Services Research (NCHSR HS-3650 and HS-4389)
and the National Library of Medicine (NLM LM-4334).
I have also benefited from the computer services at Stanford University,
some of which are supported by the NIH Division of Research Resources
(RR-785).
Systems to support our research have been partially provided by the
Intelligent Systems Technology Group (ISTG) of the AI Center of Digital
Equipment Corporation in Hudson, MA.
The TeXprogram, developed by Donald \ref[Knuth:79], was used to prepare
the
plates for printing.
The ability to prepare beautiful copy under full control of the author
is both an opportunity and a responsibility.
I hope to have carried them out adequately.
Caroline Barsalou, Mary Drake, Ariadne Johnson, and Voy Wiederhold all
helped with reading and editing chapter drafts.
Any errors in content and format remain my responsibility, and
I welcome all kinds of criticism.
Gio Wiederhold