Preface

Introduction

This book presents the concepts for design and implementation of modern databases. An analytical approach is followed so that the concepts learned can be applied not only to the wide variety of databases that exist today but can also be used build systems for the future.

This textbook differs from others in its field by its emphasis on design and the formalization of the semantics that determine design decisions. While anyone who has read about databases will have encountered the various conceptual models, as relational, hierarchical, networks, entity-relationships, etc., they are typically placed in opposition of each other rather than in a continuum of implementations of semantics. Similarly, the mathematical aspects of concepts such as functional dependencies are stressed without providing much interaction on their meaning. Once we have a formal understanding of how the structure of a database is related to the semantics of the stored information, then we can consider the transactions that are being performed as transformations of the represented knowledge. The users are an integral component of such systems. The results they obtain are combinations of the stored data and the transactions they specify. Transactions that update the database incorporate such knowledge in the system and make it available to others.

Scope

We present databases in a setting of transaction processing. Transaction processing introduces modularity into database usage. We also present the implementation of database management in a modular fashion. Modularity is beneficial in presentation as well as in actual system design. The analytical approach permits that the performance of database transactions can be evaluated, both for centralized and distributed systems. Some background from the predecessor volume, \nw{File Organization for Database Design}, will be helpful to reduce the conceptual analysis shown here to practical design details. The emphasis throughout is on concepts and structure. No actual database management systems are described completely, nor are systems surveyed and compared. Specific and realistic examples are used throughout to illustrate points being made.

Maintenance

This text distinguishes itself in that it considers the maintenance of the database throughout. The cost and effort to maintain databases is high, and a database system cannot provide useful information unless it is well maintained. WE also assume throughout that the database may be distributed over multiple computers. A unique aspect of this text is found in Chapter~7 where a specifications for the modules of a database management system is given. This material conveys both an insight why a DBMS is complex, and how it may be organized to make the complexity manageable and maintainable.

Our Outlook on Databases

In this book we present a constructive outlook on the issues of database systems. We do not try to define some models, typically the relational, hierarchical, and network types, and then set those models off against each other. Rather, we view the choices implied by these models as rational choices for specific situations. We investigates the tools for making intelligent choices, and use a knowledge-based design methodology to control the process. Now these models can be seen as alternatives in binding strategy, namely how the knowledge about the applications for the databases is embodied in the database systems to be constructed. This outlook also means that modeling concepts are carefully distinguished from implementation concepts. In particular, although we emphasize design concepts based on relational models, augmented with a formal connection concept among relations, we are able to use these design concepts for network implementations as well. Network concepts are then no longer tied to language specifications made 20 years ago, but can cover a range of implementations, past and future. We believe that this outlook assures long-term validity of the material in this text.

Organization

We follow a bottom-up approach in presentation, so that each level of abstraction is thoroughly grounded in earlier material. The layout of this book is presented graphically in Chap.~1, as Fig.~1-<6>, after the introductory definitions have been completed. Foundation material on software and hardware is presented in Chap.~2. In Chaps.~3, <<4, 5, and 6 the fundamental model structures are presented and analyzed. Implementation structures are ... in Chaps.~7, 8, and 9. Performance of entire transactions is the topic of Chap.~10, and the alternative approaches for distributed files are found in Chap.~14. The remaining chapters present complementary material: Chap.~12 details techniques appropriate for advanced performance analysis and Chap.~13 considers data representation. The closing chapter also provides a link to database technology.

Curricula

Modern curricula give increased emphasis to databases. This book presents all the material recommended for the CS-5 (Introduction to File Processing) as specified in the ACM Curriculum on Computer Science. %[\ref[Austing et al:78]]. The quantitative approach taken in this book causes algorithmic and analytic material assigned to courses CS-6 and CS-7 to be included as well, albeit limited by relevance to files. Courses oriented towards data-processing require treatment of databases as well, although analytical issues may not be emphasized. It is clear that a graduate with a Computer Science or Computer Engineering degree should be competent to deal with the concepts encountered when dealing with databases.

Origin

Some of the material in this book derives from an earlier book on Database Design. Classes based on this book and earlier notes have been given at Stanford University as File and Database Structures since 1971. During this period the field has matured, and we find some stability of fundamental concepts, although advanced applications continue to press for advances in many specific areas. The related book, File Organization For Database Design, stresses performance issues and analysis. We continue to follow an engineering attitude towards system building.

Audience

The audience for this book ranges from students of computing who have finished a reasonably complete course in programming to applications and systems programmers who wish to synthesize their experiences into a more formal structure. Some background in file structures is necessary. The material covered should be known by systems designers or systems analysts faced with database implementation choices. It probably presents too much detail to be of interest to management outside the systems and database management area itself.

Examples and Exercises

The program examples throughout the text use a simple subset of {\csc Ada\footnote{\UPSTAR}{{\ninepoint{\csc Ada} is a trademark of XXX}}}. The programs are designed to be obvious to readers familiar with any procedure-oriented programming language. The introductory examples are annotated to help the reader. Many of the examples illustrate features of actual systems and applications, but are of necessity incomplete. An effort has been made to note simplifying assumptions. The same should be done in students' design assignments, so that awareness of real-world complexities is fostered without overwhelming the design with trivia. The exercises listed in each chapter have been kept relatively simple. Some of them were inspired by an {\csc acm} {\sl Self-assessment Procedure} (\ref[Solomon:86]). It is suggested that an analysis of your local file system be made part of some of the assignments, as indicated in several of the problem statements. The analysis or comparison of actual systems may seem to be an excessively complex task, but has been shown to be manageable by students when the material of this book has been assimilated. Appendix~A provides references to a number of database systems. The primary exercise when this course is being taught at Stanford is a design project. Early in the course students prepare an objective statement for a database application of interest to them. Some individual research may be needed to obtain estimates of expected data quantities and transaction load frequencies. Exercises related to this project appear throughout the text and are labeled with a superscript $\sp{p}$.

References

Source material for this book came from many places and experiences. References are not cited throughout the text since the intent is to produce a book which integrates the concepts and ideas. An extensive background section at the end of every chapter cites the major sources used and indicates further study material. The references provide a generous foothold for students intending to pursue a specific topic in depth. The references can also direct research effort toward the many yet unsolved problems in the area. The bibliography has been selected to include some important material for each of the subject areas introduced. Trade publications, research reports, theses, and computer manuals are referenced only when used directly, although much relevant information can be found there. Up-to-date information on computer and software systems is best obtained from manufacturers. I apologize to the authors of work I failed to reference, either due to application of these rules, or because of lack of awareness on my part. A large, annotated, bibliography is being maintained by me and is available. I prefer to distribute the bibliography in computer-readable form since it is too large to be effectively scanned without computer assistance.

Acknowledgments

I have to refer to the acknowledgments in the predecessor volume ( Database Design, 1st and 2nd editions) for the many colleagues and students who have helped with the material as it developed. Significant assistance for this book was provided by Jim Gray and XiaoLei Qian. Material from theses by Dr.~Robert Blum, Ramez ElMasri, Sheldon Finkelstein, Arthur M.~Keller, Jonathan King, Toshi Minoura, David Shaw, and Kyu-Young Whang has affected the contents of this book. Concepts from papers written with Murray Berkowitz, Bill Brykczynski, Stefano Ceri, Fred~Friedman, Sham Navathe, Xiao-Lei Qian, Domenico Sacc\'a %, John Salasin, and David L.\ Spooner have influenced this text as well. Research support for much of this work came from the Defense Advanced Research Projects Agency (contract N39-84-C211) for Knowledge Based Management Systems, and applications of these concepts to health care were supported by the National Center for Health Services Research (NCHSR HS-3650 and HS-4389) and the National Library of Medicine (NLM LM-4334). I have also benefited from the computer services at Stanford University, some of which are supported by the NIH Division of Research Resources (RR-785). Systems to support our research have been partially provided by the Intelligent Systems Technology Group (ISTG) of the AI Center of Digital Equipment Corporation in Hudson, MA. The TeXprogram, developed by Donald \ref[Knuth:79], was used to prepare the plates for printing. The ability to prepare beautiful copy under full control of the author is both an opportunity and a responsibility. I hope to have carried them out adequately. Caroline Barsalou, Mary Drake, Ariadne Johnson, and Voy Wiederhold all helped with reading and editing chapter drafts. Any errors in content and format remain my responsibility, and I welcome all kinds of criticism.