KBMS overview

KBMS

edited for WWW by Gio Wiederhold, August 1995

A Precis of Research on

Knowledge-Based Management Systems

1977-1988

People

Investigators

Professor Gio Wiederhold
Departments of Medicine and Computer Science
Stanford University
Stanford, CA 94305-2140

Professor Jack Milton
Department of Mathematics
University of California at Davis
Davis, CA 95616

Prof.~Stefano Ceri (summers 1982 - 1988)
Politechnico di Milano
Milano, Italy

Participants

Peter Apers
Vrye Universiteit, Amsterdam, The Netherlands; now Prof. at the Tech. Un. Twente, Enschede, Holland.

David Beech
Hewlett-Packard Research Laboratory, Palo Alto, CA; Oracle

Robert Blum
MD, PhD, now in practice, Menlo Park CA.

Ralph Cherubini
Digital Equipment Corporation, Hudson MA

Christian Esculier
Paris and Univ.~of Grenoble, France

Barbara Grosz
PhD, SRI International; now Professor at Harvard Univ.

David Hartzband
Digital Equipment Corporation.

Gary Hendrix, PhD
SRI International; now at Symantics, Menlo Park

Jerrold Kaplan. PhD
Lotus Corp.; Go Corp.; ONSale Corp.

Prof.~Sham Navathe
University of Florida, Gainesville, FL; Georgia Tech.

Gordon Novak, PhD (1977)
SRI International, now at Univ.~of Texas at Austin

ChoChun Hsu
Beijing University

Arthur M. Keller, PhD
Stanford University and University of Texas at Austin

Kurt Konolidge, PhD
SRI International

Ingeborg M. Kuhn, PhD
Stanford AAMRS project; later at Veterans Admin., San Francisco

Prof.~Robert Paige
Rutgers University, New Brunswick NJ; now at NYU

Udo Pletat, PhD (summer 1988)
IBM Research, Stuttgart, FRG; now Professor Germany

Tore Risch, PhD (1990-1991)
HP fellow, now Professor at Linkooing Un., Sweden

Earl Sacerdoti, PhD (1977-1980)
SRI International; now at Teknowledge, Palo Alto

Daniel Sagalowicz (1977-1981)
PhD, later at Teknowledge, Palo Alto; Framentec, Monaco, and now at Oracle.

ZaiSheng Song
Beijing University

ShiWei Tang
Beijing University

Mike Walker
Stanford University (later MIS PhD); now a Consulting Professor in the MIS program.

Prof.~David Warren
SRI International; now Professor at Manchester Univ., Great Britain

Richard Weyhrauch, PhD
SRI International and Stanford University

Prof.~Marianne Winslett* (1983-1988)
Professor Univ.~of Illinois

JuanFen Yu
Beijing University

Research Assistants:

Most of the credit for the results achieved is due to the students which participated in the project. This list includes research assistants within the KBMS project as well as students who were supported by external funding or by related projects and participated as part of their academic work.

Name (year) subsequent venues

  1. James Allchin (1979-1980) PhD GA Tech 1983; Banyan Sys., Westboro MA; Microsoft
  2. Jim A. Andrews (1981)
  3. Mike Anderson (1981)
  4. Anne Beetem (1977-1981) Un.Wisconsin
  5. Robert L. Blum*, MD (RX project) (1980-1982) PhD, Stanford RADIX, Kaiser
  6. Stefano Ceri (1981) Prof. Politechnico Milano
  7. James Brinkley*, MD (supported by Dept.~of Medicine, Gyn-Ob.) (1980-1983) Prof. Univ.of Washington
  8. Sang K. Cha* (1984-1990) Prof. National Univ. of Korea, Seoul
  9. Edward Chan (1981)
  10. Surajit Chaudhuri (since 1988 supported by NAIL!) (1985-1990)
  11. Lei-Hou Chee (CVS) (1983-1984) IBM Santa Teresa Lab
  12. Francisco Corella (1981-1982) Symantec, IBM Yorktown
  13. Lori Craven (Bell OYOC) (sum.~1980) Bell Naperville
  14. James E. Davidson* (1979-1983) Teknowledge
  15. JinLi Dou (1982 summer)
  16. Steven Downs, MD (RADIX project) (1984-1985)
  17. Ramez A. ElMasri* (1977-1980) Honeywell; Prof. Univ.Houston; Univ.Texas, Arlington TX
  18. Goran Fagerstrom (1984-1985)
  19. Sheldon Finkelstein (1977-1982) IBM Res. SJ; Tandem
  20. Victor Franco (Bell OYOC) (1983) Bell
  21. Erik Gilbert* (supported by Lawrence Livermore Lab) (1978-1982)
  22. Hector Garcia-Molina* (1977-1979) Prof. Princeton; Stanford Un.
  23. Keith Hall (1986-1992) Consultant, Germany
  24. Waqar Hasan (1984-1988, 1994-1995) Hewlett-Packard
  25. Ram Kedlaya (1983 summer) UTex, Austin
  26. Arthur M. Keller* (1980-1985) Prof. UTex, Austin; Research Scientist Stanford Un.
  27. Jonathan King* (1979-1982) HP,Symantec,Teknowledge (deceased)
  28. CharLin Koo* (supported by Civ.Eng. Dep.) (1985-1987) Teknekron; Intellisys; ReliaSoft
  29. Byung Suk Lee* (1988-1991) Prof. at the National University, Seoul, Korea
  30. Wyatt Leung (1986-1987)
  31. Linda DeMichiel* (HCP, supported by Lucid) (1983-1990) IBM Research
  32. Toshimi Minoura* (Prof.~Susan Owicki) (1980-1981) Prof. USC; Oregon State Un.
  33. Robert W. Molter (US Army) (1983-1984) Army Pers. Office
  34. Diep Lan Tranh (1984)
  35. Kin-Kee Leung (1980)
  36. Kurt Lingel (Amdahl Corporation) (1985)
  37. Mohammed Olumi (part.~supported by IBM) (1979-1983) IBM Res, H-P, SAIC
  38. Edwin Pednault* (1980-1986) SRI AI center; AT&T
  39. Barbara Pernici (1983-1984) Politecnico Milano
  40. Patricia Pickett (RX project) (1979-1980) MD
  41. XiaoLei Qian (supported since 1987 by Kestrel Inst.) (1983-1990) Bell Labs, SRI International
  42. Peter Rathmann (NSF fellow 1982-1985) (1982-1990) Petrovision; Silicon Graphics
  43. Neil Rowe* (1980-1983) Prof. NPGS Monterey
  44. Thomas Rogers (1977-1981) HP; Symantec
  45. Gary Rubin (ARAMIS project) (1981)
  46. Luigi Semenzato (H-P) (1984)
  47. David E. Shaw* (1979-1980) Prof. Columbia U.; D.Shaw Ass.
  48. Kitty Shih (RLIN) (1983-1984) Stanford RLIN
  49. Mien Shih (supported by IBM) (1983-1986)
  50. Garrett Short (1979-1980) Calma, EDS
  51. John Shoch* (supported by XEROX PARC (1977-1979) XEROX; consultant
  52. Arun Swami* (supported since 1987 by HP) (1984-1990) IBM, Silicon Graphics
  53. Kyu-Young Whang* (1980-1983) IBM Yorktown; Prof. KAIST
  54. Marianne Winslett* (Bell fellow 1982-1986) (1982-1987) Prof. Univ.of Illinois
  55. John Woodfill (NSF fellow) (1984-1986)
  56. Lena Yesil (1981) Stanford Consulting Group
* obtained their PhD within the KBMS project.

Acknowledgement

This research has been mainly supported by the Defense Advanced Research Projects Agency under DARPA MDA903-77-C-322, N39-80-G-132, N39-82-C-250 as task (3), and N39-84-C-211 as task (10). The monitoring agency has been the U.S.~Navy SPAWAR office, formerly NAVALEX.

A task (6) on disjoint domains was supported directly by SPAWAR. Implementation of concepts developed earlier is supported by the DoD STARS program, through a contract with SRI International. Knowledge-acquisition research was partially supported through the RX project, funded by the National Center for Health Services Research (HS-3650 and HS-4389) and by the National Library of Medicine (LM-4334).

Work in natural-language interaction has been partially supported by NSF (IST-8023889).

Computer facilities for medically related applications were provided by the Division of Research Resources of NIH (RR-785). Additional equipment is being provided through a cooperative grant provided by Digital Equipment Corporation, through their Hudson, MA, Artificial Intelligence Research Group.

The views and conclusions contained in this document are those of the authors and should not be interpreted as representative of the official policies, either expressed or implied, of any agency of the U.S. Government.

The Knowledge-Based Management Systems Project

Summary

The Knowledge Based Management Systems (KBMS) Project addresses the problems of intelligent processing of information from large databases. Such databases are vital to the management functions of modern enterprises. In modern computer networks these databases are typically accessed indirectly, since they reside on other local and remote computers. The supporting systems may be autonomous and heterogeneous. The data is stored on files and maintained by a variety of programs. To make effective use of such data the available knowledge about the data semantics and representation has to be formalized in such a way that it can be exploited automatically.

We have investigated a wide variety of types of knowledge about data and databases, from relatively formal semantics based on relational models, to probabalistic and uncertain concepts. The goal of the individual investigations is to devise methods which make access to data more convenient, more effective, or more efficient.

Data and knowledge bases will only be of value of they are comprehensive within their scope, and up-to-date. We have devoted much of our research efforts to the problems arising with the maintenance of large databases and, more recently, large knowledge bases.

Rationale

The utility of computers and computer-based systems in large organizations has been limited more by inadequate techniques for managing large quantities of data than by our technical ability to gather and store those data. Existing systems for collecting, storing, and retrieving data are too inflexible and complex for easy extraction of relevant, concise information. To obtain information, data must be selected and processed so that it can be of value to decision-makers. Traditional output from databases produces large reports and messages from the execution of formally stated queries, but this represents primarily selected data rather than focused information. Typically, several layers of management and technical staff must intervene between the data systems and the users, impairing the efficiency and responsiveness of decision-making that depends on information stored within the databases.

Managers are aware of the value of control of and access to information; they depend on control of the data developed and used by their enterprises. The centralization of this data, which until recently was an essential part of having and sharing large databases, has resulted in a loss of control because of awkward access and dependence on remote personnel. While economic and technological considerations have favored centralization of data storage and processing during the past, recent directions in hardware development, including VLSI technology, are making available networks of smaller, yet very powerful, special purpose computers that can be used to collect, store, and process information close to its source. Loosely-coupled networks of different machines, personal workstations, etc., are likely to be the dominant type of computing environment within a few years. Consequently, local control of local information will be the rule rather than the exception. Such distributed environments pose new difficulties for the management of data resources. For instance, while control over local data has improved, access to remote data may be more difficult.

Another challenge presented by the move to distributed data distributed systems, is that the knowledge and authority required to manage the diverse sources of information are themselves distributed among multiple sites [Wiederhold;82e]. Familiar methods of data management--in which perfect information about the structure, semantics, and content of the data is normally assumed--are not adequate to deal with the less structured environment encountered in truly distributed systems. We have found such areas, where knowledge is not perfect, ideal for the application of artificial intelligence techniques.

Developments in artificial intelligence over the last two decades provide concepts that are of value in increasing the responsiveness of database systems to their user's needs. To pursue our work in applying artificial intelligence techniques to the practical domain of database management, we study the problems of data management in centralized and distributed computing environments, and the management of the knowledge that pertains to that task.

Difficulties occur in every aspect of data management: database design, understanding the information potential of the database, and actual use of the computer. Specific issues include data abstraction, query processing, security, redundancy, optimization, integrity, and decentralized processing. The management of knowledge becomes more specific when applied to databases. While the quantity of knowledge needed may be large, it has a scope defined by the corresponding databases and operates in a world which in many respects is closed relative to that scope.

Research Approach

The history of database development and research shows a continuing interaction of industrial efforts. The scenario is often that industrial achievements concentrate on handling larger tasks with adequate performance, and research contributes formalizations and clarifying methodologies [Wiederhold;84b]. We follow that scenario.

Our research probes the quantitative and complexity boundaries of data management. Large quantities of data, complex data relationships, and the limits of storage technology all require that advanced techniques be pursued if realistic problems are to be approached effectively.

In order to understand the problems formally we typically begin by investigating methods and algorithms available in database design and operation [Wiederhold;83b]. We first ascertain the limits of algorithmic approaches and new technology, and subsequently develop heuristic methods to probe beyond those boundaries. In much of our research we subsequently bring to bear the techniques developed in the artificial intelligence (AI) community for dealing with partially understood, arbitrarily structured problem domains [Barr;80], [Wiederhold;83f].

The introduction of heuristics leads to imprecision. We can quote, however, the British mathematician Alan Turing [Hodges;83], who stated in 1946:

In other words then, if a machine is expected to be infallible, it cannot also be intelligent.

The capability of symbols, manipulated through these AI techniques, to represent generalizations and abstractions, provides the basis for intelligent behavior. Recognition of uncertainty, which is an essential aspect of dealing with higher level concepts rather than with specific instances, is hence an essential feature of knowledge in our problem domain.

The specific AI tools are based on such concepts as semantic modeling and build on existing database technology. The databases are effective keepers of information describing the instances that the knowledge is applied to [Wiederhold;86a]. Although the traditional province of database management is in business data processing, we are also considering the data-management of large scale scientific problems [Wiederhold;82g]. The diagram in Section 10 of this report provides a layout of the tasks we have addressed.

To provide an experimental basis for our research we maintain several large databases. One of these includes data on $34\,000$ merchant ships, ports, and shipping information. A second database contains design specifications for the dec pdp-11 computer.

Most of the early work was carried out on dec-10 and 20 computers at SRI International and Stanford. For the new tasks, described in Section 5, we have equipment configured to emulate a modern workstation, with 8 megabytes of memory and 600 megabytes of storage, and access to the ethernet. The current computing engine is a VAX 750, with graphic and speech output capabilities. Primary system software is Common Lisp and the rdb relational database under VAX vms.

A variety of database-management approaches have been used in our research, network [codasyl;71,74], relational [Codd;70,71], [Blasgen;77], [Held;71], as well as special ones. The medical data are kept on tod [Weyl;75]. The bibliographic database is available for search in fully inverted form using the news service (ns) system on the Stanford sail computer. This bibliography can also be shipped over the arpa net. A printout can be furnished upon request, but we consider it too large for manual perusal.

We concentrate on promising task areas within our research definition rather than on building complete systems. Since this note is intended principally as a review of our own research, we have not cited all of the many publications of other authors that have provided us with pertinent results, important ideas, and useful guidance in respect to general research directions. General references which contributed significantly to our research are listed in Section 12 of this report, while the papers published under the aegis of the KBMS project are listed in Section 11.

Research Areas and Results

Our research falls into the areas of database design, human interfaces for databases, and performance models of database operation for both centralized and distributed databases. Our work is concentrated at a high conceptual level so that our results may be applied to relational, hierarchical, and network implementations. The application of knowledge about databases is a critical ingredient in all of our research [Wiederhold;84a].

We have categorized our efforts into six areas, although many issues overlap. The six sections (4 through 9) which follow provide short summarizations of our research. Many subsections represent a PhD dissertation and related research. Section 10 presents general conclusions about the results obtained.

  1. Sec.~4, Design Methodology:
    The Structural Model;
    The Data Definition Facility of critias;
    Integration of Diverse User Models;
    Integration of Diverse User Models at Diverse Sites;
    The Design of Physical Databases from the Integrated Model;
    Distributed Database Design;
    Design of Database Management Systems.
  2. Sec.~5, The Architecture of Knowledge and Databases:
    Inferencing Over a Database;
    Formal Semantics of Nulls;
    Subset Definition;
    Lexicon Management.
  3. Sec.~6, Maintenance of Databases:
    Update Constraints;
    Updating through Views;
    Natural-Language Database Updating;
    Finite Differencing;
    Maintenance of Distributed Databases;
    Distributed Resiliency Mechanisms.
  4. Sec.~7, Querying Databases:
    Graph Model of the Database;
    Cooperative Interactive Interfaces to Databases;
    Descriptive Responses to Database Queries;
    Task Models;
    Semantic Query Optimization;
    Common Expression Analysis;
    Use of a Statistical Abstract of a Large Database.
  5. Sec.~8, Applications:
    Experiments with Database Technology to Support VLSI Design;
    Planning;
    Experiments in Intelligent Data Analysis.
  6. Sec.~9, Tools:
    File Access System;
    Intelligent File Systems;
    Database Machines;
    Access to Optical Storage.

We will now present these topics in more detail and cite the relevant references. Note that this precis is limited to work actually performed. Work being planned or in progress will be reported in further editions of this precis, as well as in separate publications.

Knowledge Base Structures

Problems of knowledge maintenance in large knowledge-based systems motivate our research. Today these problems are evident in only some instances, but will become more prevalent as knowledge-based systems grow in scope and depth, and last beyond the lifetime of a PhD thesis. Some researchers from the AI community have looked towards database technology to help in dealing with issues of size and update management [Kerschberg]. Database systems have focused on simple structuring and normalization to deal with large bodies of information, and do not deal well with the complexities of structures needed to represent knowledge. We are using concepts from database research here as well, but must be very careful in intermingling database and knowledge-base representations. We need to avoid creating a combination with the weaknesses of the two fields, rather than the strengths. Future information systems will benefit from distributed knowledge sources and distributed computation. An architecture to deal with future systems must consider the technological opportunities that are becoming available. We see these systems supporting decision-makers through a two-phase process:
  1. Locating and selecting relevant factual data and aggregating it according to the decision alternatives.
  2. Processing and reducing the data so that the number of alternative choices to be decided among is small, and the parameters for each choice are aggregated to a high conceptual level.
Today most of these support tasks are carried out by human experts who mediate between the database and the decision maker. For many tasks in medicine, warfare, emergency relief, and other areas requiring rapid actions, dependence on human intermediaries introduces an intolerable delay. Future information systems will increasingly need to use automatic mediators to speed up these support processes [Wiederhold89c].

The databases, the mediators, and the applications will all reside on nodes of powerful networks. The end-users will always have computers available to serve their specific tasks. We refer to those machines as application workstations, although they may at times be large and powerful processors.

Demonstration of a Frame-based Interpreter to Demonstrate Mediation

To demonstrate the concepts of a frame-based interpreter which accepts directions and focus from frames, we have chosen the task of assigning reviewers for journal papers submission. The data sources are distributed, the mediators are distinct, but all code was combined into one demonstration program. Five SoDs serve the task:
  1. Relevance: we need reviewers with a background relevant to the submitted paper. This task is performed by matching in a keyword classification hierarchy.
  2. Quality: we prefer the most qualified reviewers. For this task we rank potential reviewers based on their published output in books, journals, etc.
  3. Conflict avoidance: we cannot assign reviewers to friends or colleagues. Colleagues are people that have been affiliated with the same institution as one of the authors with overlapping time intervals.
  4. Coauthor avoidance: coauthors of other papers with one of the authors are also to be avoided.
  5. Responsiveness: The reviewers must produce their reviews in time. Here we can look at a log of electronic-mail interactions.
We can use this example to elucidate the difference of the AI paradigms allocated to level H3 and H2. The tasks in the SoDs at layer H2 can all be defined quite formally. At the top layer H3 some unwarranted but critically important pragmatic heuristics are used to implement the reviewer selection task. For instance:
  1. Having written high quality publications in a topic area does not assure one that the candidate does equally well as reviewer. It is the best guess that our application task can make, but we all know some excellent critics who do not write much. The mapping of {\tt qualified\_writer \RIGHTARROW qualified\_reviewer} is pragmatic. The establishment of a set of {\tt qualified\_writer}s is adequately formal to justify its allocation to a SoD.
  2. Having worked together does not make one a friend, and being a friend does not imply favoritism. But we do need to weed out risky matches--in fact, due to prior publications the most likely best match is the submittor of the paper.
  3. Electronic mail responsiveness is probably only weakly correlated with fast reviewing: there are people who respond instantly to email and never respond to review requests.
The language currently used between layers H1 and H2 is lisp because it supports the extensibility essential to rapid research progress.

The data accessed by the first four SoDs are distinct views of an extensive bibliography of knowledge and database references. Information exploited includes type of publication (for the Quality-SoD), authors (the principal identifiers), author's location (for the Conflict-avoidance-SoD), publication details and sequence with dates (for the Conflict-avoidance-SoD), coauthor lists (for the Coauthor avoidance-SoD), as well as title, abstract, and classification (the last three are used by the Relevance-SoD).

Two of the SoDs are currently implemented--relevance and conflict avoidance. They are implemented as Lisp programs which have access to the object system and a commercial relational database. As we gain more experience, we intend to replace the Lisp code with a more declarative representation.

At the simplest level, the relevance SoD takes a keyword (or list of keywords) describing the subject of the paper to be reviewed, and looks in the database for authors who have written papers on the keyword(s). If the enough authors are returned by this database query, this is all that happens. If however, the database query does not find enough authors, or if the application asks for more candidate reviewers later, the relevance SoD replaces the original query by a more general one, in order to increase the cardinality of the result.

This capability is an example of query generalization [Chaudhuri;89]. It is possible because the SoD makes use of some of the semantics of the keywords. The keywords are arranged in a hierarchy, in which the parent is the more general keyword, and the children the more specific. If a query is does not return sufficiently many results, a concept of semantic distince in the hierarchy is used to suggest alternate keywords to try.

The data structures used by the SoD are designed to efficiently support this kind of iterated query style efficiently. A set of authors can be found by a succession of related queries can be answered with about the same total effort as would be needed to find that same set of authors with one more general query.

The application interface is simply a set of Lisp functions which the application can use. As our system evolves, we intend to built a higher-level interface. In our design, an application which is looking for reviewers would submit a query of the form:


select best 3 reviewer
  from relevance-SoD
where relevant = 'knowledge-base'   
      and reviewer not in 
          (select all friend
          from conflict-avoidance-SoD
          where author = 'Gio Wiederhold')

Note that this query refers to two different SoDs. Since a particular SoD can only answer queries about its own domain, this query is translated into a slightly lower level form, which specifies the individual queries to the SoDs, and the information flow between them.


X := select all friend
         from conflict-avoidance-SoD 
         where author = 'Gio Wiederhold'
Y := select best 3 reviewer
         from relevance-SoD
         where relevant = 'knowledge-base'
         and not in X

RETURN Y

Note that even this second query is fairly high level. It refers to such abstractions as relevant, which are implemented by the SoDs.

At the end of the funding period we shall be able to demonstate the complete system, with all five SoDs. In addition we expect to be able to demonstate advanced techniques for combining the knowledge and results from the five SoDs to achieve a balanced assessment of the reviewers. In particular this require a means to weigh conflicting information from the SoDs, as a given individual may be outstanding according to some criteria but poor according to others.

Integration of Object-Oriented Methods with a Relational Architecture

Person: Barsalou
This work investigates the hypothesis that an object-oriented approach can serve as a unifying framework for developing expert database system (EDSs). Objects offer the appropriate level of abstraction for EDSs [Barsalou;88a, a, 89a, b]. Storing information in the form of complex objects, however, can seriously inhibit sharing and flexibility since persistent objects bind application-dependent knowledge to the data. A desirable compromise is to exploit existing database technology by defining an object-based layer on top of a relational dbms. This approach does not call for storing objects explicitly in the database, but rather for generating and manipulating temporary object instances by binding data from base relations to predefined object templates. This work focuses both on the theoretical understanding and practical aspects of managing complex objects in a relational framework.

The primary intent of our formal model is to remedy the lack of structural abstraction in the relational model. Objects are specified recursively as follows: A base object is a non-empty set of projections defined on the base relations. A composite object is a set of other objects. Composite objects can be further subdivided into non-recursive and recursive objects. To render practical the mapping between the relational store and the object working representation, I add constraints to this general definition. Those additional constraints rely on the use of a semantic data model--the structural model--and enable me, among other things, to derive a unique hierarchical structure for base objects, to support record-valued and set-valued domains, and to guarantee termination for specific types of composite recursive objects.

Our prototype system, called penguin, defines three components for the object layer: (1) The object generator maps relations into object templates where each template can be a complex combination of join and projection operations on the base relations. (2) The object instantiator provides nonprocedural access to the actual object instances. Combining the database-access function (stored in the template), and a declarative query, penguin automatically generates the relational query and transmits it to the dbms, which in turn transmits back the set of matching relational tuples. Those tuples are then assembled into new object instances; (3) The object decomposer implements the inverse function; that is, it maps the object instances back to the base relations. This component is invoked when changes to some object instances need to be made persistent at the database level.

We expect our results to indicate that an object-based architecture provides (1) efficient access to and manipulation of complex units of information while preserving the advantages associated with persistent storage of data in relational format, and (2) a domain-independent, bidirectional communication channel between relational database systems and expert systems.

In-Query Cooperative Ad hoc Query Interface

Person: ChaSK
Ad hoc query interfaces are critical to solving open-ended problems arising from human organizational activities. Providing cooperative ad hoc query interfaces to users has become more important than before, as the growing number of personal computers and workstations leaves more end users in direct touch with databases. These users cannot afford intermediary programmers to provide handcrafted solutions for a given task.

For ad hoc querying, linear query languages have been widely used. sql and restricted forms of natural language are such languages. Most ad hoc query interfaces based on linear query languages are, however, passive. Their behavior is characterized by the passive reception of user queries and the provision of isolated textual user guidance. Passive ad hoc query interfaces are more frustrating to end users than a typical operating system interface, because database queries tend to be more complex than operating system commands, and thus more prone to various types of query failure. This query failure is a major obstacle to achieving the high productivity of end users.

Kaleidoscope is a cooperative ad hoc query interface designed with the productivity of end users in mind. Two query languages are considered: sql and a restricted form of natural language. The key feature of Kaleidoscope designed for increasing the productivity of end users is multi-step in-query cooperative guidance. Instead of receiving query constituents passively, the system leads users through a sequence of cooperative interactions to guide them away from the potential failures in a query. At each step, a set of legitimate query constituents or templates for such constituents are proposed to the user based on the system's knowledge of query language grammar and the structure and content of underlying databases. In addition, any information that is derivable by adding a query constituent to the partial query structure (but that may be unknown to the user) is displayed on a system message window. Essential to Kaleidoscope's cooperative interactions is a menu front-end that facilitates the communication between the user and the system as in the so-called menu-based natural language interfaces NLMenu and inglish.

Kaleidoscope's in-query cooperative guidance is relevant to the past work on post-query cooperative response. Many types of knowledge turn out to be commonly useful to both in-query guidance and post-query response. However, Kaleidoscope's in-query guidance is based on more active way of utilizing such knowledge than post-query cooperative systems: if the system is knowledgeable enough to suggest the post-query correction of failed queries, its knowledge should be used instead to guide users away >From query failure. Even if most types of query failure is prevented in Kaleidoscope, it is still possible that the result of a query may be unsatisfactory to the user. For instance, a query may produce no tuples, or it may produce too many tuples. In such cases, post-query cooperative response may complement in-query guidance.

Semantics of Knowledge Views

Person: Rathmann
<%still needs some editing - some of the claims are a little strong. pkr> Interacting with and comprehending almost any large and complex system is easier if that system can be decomposed into a number of smaller, simpler parts. Software designers realized this early on, and introduced subroutines and modules to provide global structure to programs. This research provides a semantics for structure and views in knowledge bases.

We consider knowledge to be represented as logical theories, perhaps augmented with non-monotonic assumptions, such as a closed world. Integrating parts of a knowledge base then becomes semantically equivalent to the process of combining subtheories to give a global theory. Since the resulting knowledge base may contain much information which is irrelevant to a given user, it is then helpful to have a capability for customizing the subsets of the integrated knowledge base to give user views.

We need a firm foundation before such operations can be defined. Therefore a portion of this work has focused on issues of semantics of knowledge bases. First order logic is not capable of representing the non-monotonic assumptions which a knowledge base needs to accurately depict the world. To gain such a non-monotonic capability, we have turned to a circumscription-based formalism.

Circumscription provides a very powerful and precise tool for defining unique name and closed world assumptions. Investigations into the unique name assumption have resulted in a new form of circumscription that is well suited for representing the semantics of knowledge bases. It is the first form of circumscription to handle equality in a principled fashion, and the first to provide unique names assumptions by default.

A prioritized extension of equality circumscription can then be used to define the semantics of integrating knowledge components and definining views on the resulting knowledge base.

A practical benefit is that the semantic description provides a guide for implementation. In particular, it provides a test for when queries to the global knnowledge base into may be decomposed into queries to the components, as well as a technique for performing the decomposition.

This formalism also suggests the beginnings of a design theory for knowledge bases. In particular, it provides checks for when a knowledge base contains redundant information, as well as a test for when a view faithfully reflects the state of the global knowledge base. <% reference Rathmann IJCAI 89>

Performing Database Operations over Mismatched Domains

Person: DeMichiel
A database system must often combine data from a variety of sources so that decisions can be based on a broad range of information. When data must be obtained from several database systems, it may occur in forms that are not directly comparable.

We present a solution to the problem of correctly performing relational database query operations despite data domain mismatch. We use the mechanisms of domain mappings, virtual attributes, and partial values to resolve domain differences and to map conflicting attribute values to common domains. We formalize operations over relations containing such virtual attributes and partial values by the definition of an extended relational algebra. We analyze the properties of this extended algebra and describe the system we have implemented that uses it to resolve domain mismatch and to perform operations over information from mismatched domains. This work enables information derived from multiple sources to be presented in an integrated form [DeMichiel;89a, b].

Optimization of Large Join Queries

Person: Arun Swami
Current query optimizers normally expect to process queries involving only a small number of joins (less than 10 joins). We expect that some future applications will require processing of queries with a much larger number of joins. In this research, we investigate the problem of optimizing Select-Project-Join queries with large numbers of joins (specifically 10 to 100 joins) [Swami;88, 89].

The results in this thesis are validated using several different synthetic benchmarks, and two different models of join processing costs: one, a conventional disk based model, and, two, a model for memory resident databases. We develop a cost model for join processing in main memory databases. This model is validated for the uniform distribution and more general distributions including the Zipf distribution. Simulations based on the cost model are used to study the performance of various join methods.

To study the characteristics of the search space of query plans, we obtain the statistical distributions of query plan costs. We observe that, as the number of joins increase, the difficulty of the optimization problem increases dramatically in the range considered. We study the effectiveness of the heuristic of postponing cross products as late as possible. We find that the number of sampled query plans of low cost increases significantly using the heuristic. For 10 join queries, the percentage of low cost query plans increases from 2.4\% to 79.6\%, and for 50 join queries from less than 0.1\% to 3.8\%.

We next address the problem of determining good methods for optimization. Taking advantage of the commonly used heuristics of pushing selections down and performing projections as soon as possible, the problem of optimizing large join queries is reduced to the problem of determining the optimal join order. The latter problem is a hard combinatorial optimization problem. We apply general techniques, such as iterative improvement and simulated annealing, that have proved effective in a variety of combinatorial optimization problems. We then study the use of heuristic techniques, such as augmentation and local improvement, in combination with the general combinatorial techniques. We observe great differences between the many different methods considered. We find that two combinations of the augmentation heuristic and the iterative improvement technique are superior to all the other methods.

Use of constraint rules

Person: Qian
Programming remains to be the primary means of instructing computers to perform work. Conventional programming is costly and error prone. It does not have precise and well-understood principles. The automation of programming has been a dream since the birth of the first computer. Although the concept has changed a lot, what was considered automated programming fourty years ago becomes compiling now, the idea is still the same: we tell the computer what to do and have the computer itself figuring out how to do it.

Database programming is an instance of programming. Besides the common properties it shares with programming, database programming requires having the knowledge of database semantics - the rules and the constraints applied to the database so it remains consistent and sharable by many users. Such knowledge is necessary both to maintain database integrity and to explore more optimization opportunities. Automated programming in databases is common. The fruitful research in query processing, semantic query optimization, and logic programming shows that automated programming of data retrieval programs is not only possible but also feasible. Similar approaches can be taken to programming data manipulation programs, namely transactions [Qian 86, 87, 88a, 88b, 88c].

The paradigm of database management makes automated programming more desirable. Since most database systems today support to various degrees the specification of database semantics, it is time to provide automated tools to exploit the use of this semantics. User-supplied transaction specifications are often small incrementals to the specification of database semantics, which is oriented towards human understanding rather than machine execution, and transaction programming involves routine translation of such combined specifications into procedural code. Programmers rarely have a complete understanding of database semantics. For reasons such as ease of use and security, databases often provide programmers with access to views, which are selected small portions of data. Transactions are reused over long period of time through changing situations, while prorammers do not stay. Moreover, tremendous opportunities for optimization exist because transactions usually perform small changes to database states. Since manual optimization carries extremely high risks of violating database semantics, it should hence be the responsibility of database systems to synthesize executable code from transaction specifications that preserve the validity of database semantics.

Database technology makes automation more feasible. Less effort is needed in writing transaction specifications because a large portion of them, namely the database semantics, is already present. Transactions are dominated by data manipulation instead of computation. On the average, eighty percent of the common transaction code is about constraint checking and enforcement. A small number of language constructs, such as insert, delete, modify, test, and iterate, constitute a reasonable transaction language. Simple algorithms and regular data structures suffice for most of the data manipulation jobs. These features significantly simplify the synthesis task.

Certain aspects of database management raise new challenges to automated programming. Transactions are imperative programs which manipulate database states. Specifications are often incomplete, requiring the synthesis of transactions with attached applicability conditions. Database semantics must be utilized to optimize both the synthesis process to reduce search effort and the transactions generated to enhance runtime performance.

We approach the synthesis of transactions which preserve the validity of semantic integrity constraints using deductive tachniques. A class of iterative transactions is defined and shown to have exactly polynomial-time expressive power. An iterative transaction logic is developed as the formalism with which the synthesis of iterative transactions can be conducted. Transactions are generated as the by-product of proving specifications in the logic. The deductive-tableau proof system developed by Manna and Waldinger is extended with inference rules for the extraction of iterative transactions from proofs, which require the coorperation of multiple tableaux. It is shown that the proof system is sound and complete for the synthesis of the subclass of first-order transactions. Properties of transaction specifications are studied to identify a class of bounded stratified specifications for which the synthesis process always terminates. Control strategies are also developed which utilize the database semantics to reduce search effort in the synthesis process and to enhance runtime efficiency of the transactions generated. Our approach is demonstrated with a prototype implementation.

Design Methodology

This area concerns research to improve the techniques of database design and modeling. The structural model, formally defined within the KBMS project, captures those semantics that are of importance in the design of the database's physical structure [Hendrix;75] [Wiederhold;77, 78a, b, 82d, 83a]. The concepts of the structural model have been expanded and applied in other research aimed at facilitating various stages of the database design process. They provide the basis for the data definition facility of critias, which implements a schema language based on the structural model [Qian;85b]. They are used in the task of integrating diverse user views, whether in a centralized or ditributed setting. And finally, they are used to aid the design of physical databases from the integrated user model.

Further avenues of design oriented investigation have been distributed database design and the design of database management systems.

The Structural Model

Person: Wiederhold
<% Subsection 4.1 > The structural model adds the concept of connections among relations to the basic concepts of relations. Connections model relationships between entities, whereas relations primarily model entities. Connections have formal properties, embodying constraints related to functional and inclusion dependencies. Rules for the maintenance of correctness under update of the database are also given. Three types, ownership, reference, and subset describe distinct constraints imposed by the user on the relationships. The structural model can be considered as having formalized the important relationship aspects of general semantic models. These models (e.g., the entity-relationship models) are used to embody the requirements of the users [Chen;77] [ElMasri;79b] [Shipman;81] [Wiederhold;79a, c, 82b, 83b].

Our work on database design is based on the use of these structurally relevant semantics, which form the basis for defined connections in the database. Three connection types are adequate to model any structural database constraint other than arbitrary functions: ownership, reference, and subset. When the expected usage loads are mapped to defined connections, the codified knowledge will provide the quantifiable guidance for the physical design. The correctness of binding decisions is necessary to make large databases capable of acceptable performance [Wiederhold;81b], [Ceri;81,85b]. The existence of a model also allows making decisions about reconfiguration of a database as the loads change over time. The problems of asynchronous operation of distributed databases are being modeled using the identity connection [Wiederhold;87].

It does not matter in what manner the connections are implemented. This permits the logical design to proceed independent of the eventual implementation. This freedom is critical for databases which are expected to outlive current technologies and for databases which may be implemented differently on multiple sites in distributed system.

The primitives of the structural model are also applicable to databases serving non-traditional functions as engineering information systems for hardware [Wiederhold;82b] and maintenance control systems for software [Olumi;83].

The Data Definition Facility of critias

In order to clarify the concepts of schemas independent of their implementation we have developed a free-standing schema definition language critias. The language provides syntactic embodiment of a semantic data model. The basic modeling concepts of critias are entity types, attributes, and three types of relationships (subtype, association, and reference), which are intended to model the conceptual objects, their properties, and relationships among them. Language constructs are provided for specifying semantic integrity constraints at all these levels of modeling concepts. We also show that there is a direct mapping from critias constructs to relational schema hence the implementation of critias is straightforward [Qian;85b].

The Integration of User Models into a Comprehensive Database Model

<% subsection{4}1.2> In order to generate a comprehensive database model many data models of individual applications may have to be integrated. The design of a large database must consider many potential users, each with a distinct perception of what the database should look like. The structural model supports the integration of many such views into a database model to support all users [ElMasri;79a, 80]. Included in the model is a formalism for the clear expression of structural-integrity constraints in each user's view of the application area and rules for resolving these constraints relative to all users. The integrated database model can remain invariant while performance and operational constraints cause the implemented version of the database to change [Wiederhold;83a. [Wiederhold;82d]. Each integrated application also obtains access to the data belonging to other applications in a consistent manner. The eventual database submodels for the applications will be richer than their original specification. These revised database submodels can be used to define views and windows for user access to the database.

Integration of Diverse User Models at Diverse Sites

Person: Wiederhold
<% subsection 4.3> In a centralized operation a database administrator is responsible for the design of a central database that supports the diverse views of individual users, each of whom can be aware of only a portion of the actual database. In truly distributed situations there may be no single individual with sufficient global knowledge (or authority) to perform this task [Wiederhold;83a]. While no central database will exist, a comprehensive data model is critical for automatic maintanance of overall consistency and for performing distributed processing of queries and updates.

An alternative approach, distributing integrity authority and responsibility to autonomous local sites has been explored in the context of networks of personal computers. The bulk of actual maintenance actions can be delegated to rule-driven agents, which inspect the update messages [Wiederhold;84d].

The Design of Physical Databases from the Integrated Model

Person: Whang
<% subsection 4.4> The structural model permits the attachment of projected usage factors >From the individual users, and the combined knowledge can be employed to design an effective database implementation. Such a database may be implemented using technology based on relational, hierarchical, or network concepts [Wiederhold;82a].

A straightforward algorithmic optimization approach is not feasible when designing a multi-file database since the combinations of the decision space greatly exceed reasonable time bounds. Addressing this problem, we have developed a method which will provide implementation guidelines for a relational database design in close to linear time [Whang;81a, b, 83a, 84, 85], [Wiederhold;83a]. The concept developed is termed separability of joins between the participating relations.

When the relations of a database are separable, the allocation of clustering and indexing choices for a database implementation can proceed relation by relation. The assumptions required for this approach appear to be fulfilled over a wide range of practical cases. We have also extended this approach to network databases [Whang;82].

A by-product of this research is an improved equation for the estimation of block accesses when retrieving records from files [Whang;85]. This formula has been adapted for the optimization of query processing in relational systems.

Distributed Database Design

Person: Ceri, Wiederhold
<% subsection 4.5> We have also explored the design issues in distributed systems and explored useful algorithms for optimality. In this process we have again found limits of design problem complexity which require the development and evaluation of heuristic approaches. We have developed and published algorithms for the partitioning of databases by selecting fragments defined as tuple sets (horizontal partitioning) or as attribute sets (vertical partitioning) for distribution. In a related project we have developed and evaluated algorithms for distribution within clustered processors.

Our work on database partitioning uses structurally relevant semantics to define connections in the database. These connections and the expected usage loads provide the guidance for the physical binding design decisions [Ceri;81].

This issue maps directly into distributed systems: the cost of connections that go via communication links becomes even more critical. We assume initial user input to select candidate fragments of the database for partitioning and a list of important transactions using those fragments. With each transaction an origin site is associated. A simple model that permits replication but ignores storage and update costs would have that site contain all its needed fragments.

When storage and update costs are considered, the optimal site assignment for the fragments is very difficult. We have been able to formulate and solve an integer programming model for this problem for a modest number of fragments and transactions when there is no replication. We have also analyzed the heuristics required when replication is permitted [Navathe;82], [Ceri;83].

Multiple processors may also be used in clusters, perhaps connected by a high-bandwidth local net, in order to achieve higher aggregate performance than a single large processor can provide. When the processors are clustered, the entry point for a transaction is of little concern. Partitioning is now constrained by limits imposed by processor, input-output, and communication capability [Sacca;83], [Sacca;85].

However, in distributed databases a major technique to achieve acceptable performance is the replication of data [Garcia;78c] [Barbera;82] We are beginning to model this technique using the concept of an identity connection [Wiederhold;83b]. When replication is used in the centralized environment the binding is implemented quite rigidly (i.e., no access is permitted until updated replicated elements are all set to identical values) [Wiederhold;81b]. We find that the cost of maintaining rigid identity connections is quite high in distributed systems, and are investigating methods that permit a relaxation of those constraints.

In all those approaches to database design, the limits of algorithmic design methods seem to have been reached. The additional complexity that occurs when the fragments can be replicated always requires heuristic approaches [Ceri;81].

When databases are not maintained through a central administrative service, we speak of a loosely coupled system. These systems will require techniques to deal with the uncertainty of identifying relevant data, locating their instances, and determining their status in terms of currency and quality. We have begun to investigate techniques where expert systems help manage these issues [Apers;84b, c]. While there is an inherent level of uncertainty in the obtained results, it appears that the uncertainty can be quantified and manipulated. Development of adequate techniques in this arena is essential so that the benefits of autonomous systems are not outweighed by the difficulties of sharing data in these systems.

The Design of Database Management Systems: DADAISM

Understanding the flow of data and the flow of knowledge to control the data has provided the critical underpinnings for the design of a proposed new DBMS. Here we have to consider the information flow in an application-independent manner. We consider all levels of the database system implementation problem starting with a simple operating system interface and going up to the level of knowledge-based user services [Wiederhold;86b].

This research is motivated by a plan to re-engineer a large multi-site control and command system, WWMCCS, using modern software principles. In accordance with Department of Defense (DoD) policies, the software for the WWMCS Information System (WIS) is designed to be implemented in Ada.

The methodology of the specific work is to start out fresh. This permits the use of modern software and data engineering approaches to database systems, using the power of Ada in the most appropriate way. Most of us believe strongly that the cost of adapting re-engineered applications to 20-year and 10-year old technology, as exemplified by existing DBMSs, is greater than the cost of rethinking and a new DBMS implementation. If we do not put the principles we have learned in the past twenty years to work now, we will be overtaken by others who do not carry the baggage of the past.

The proposed architecture takes advantage of the structure of Ada. Since Ada supports alternative code bodies with the same declarative specification, it becomes natural to maintain and use alternate modules. However, in order to keep the specification unchanged, the initial design must consider the information requirements by any candidate module. In practice, modules developed in the prototyping stage will not use information needed for an optimizing version of the same module type. Similarly, modules for a workstation database will not use much of the security information available for protection in a shared environment [Wiederhold;83e], [Friedman;86].

The context of this research project reinforced the consideration of security and access controls within the initial design, rather than treating security as an add-on. If a high level of access protection is needed, overall reliability and performance will be improved by incorporating clean and appropriate interfaces in the original system design. Whenever the requirements are less strict, the cost of having null or minimal modules in their place will be small [Spooner;85].

Proposing a fresh start for a major system carries a considerable risk. Doing this in the database area is made more feasible due to the initial orientation of Ada. Since Ada was designed as a language for embedded systems, it did not inherit any prior data-processing notions, a problem other general purpose languages such as pl/i have to cope with. Only the most primitive input-output functions are specified in Ada. Hence, we were able to start with a relatively clean slate.

This project presents an example of a data-oriented design methodology. In software projects where we expect that a common data structure will serve many application functions, a data semantics-oriented approach may be a better alternative to a top-down design starting from a procedural objective. In this data engineering approach we still proceed top-down, but start from the information requirements.

Especially when we envisage that modules are replaceable, procedural specifications do not provide a solid base. No procedure can generate information unless the required data and knowledge are made available to the module. When data and knowledge are identified initially, then, in most cases, a competent programmer will have few problems in coding the required transformation procedures and generating a successful program.

On reflection, it appears that the architectural approach proposed, as a conceptualization of architectures found in current DBMS, bears a significant resemblance to the blackboard concepts used in some Artificial Intelligence Systems. An early example of a blackboard is found in hearsay [Reddy;76] and recent work is bb1 [Hayes;85].

Making the distinction in this design process of:

  • Knowledge: The controlling and general information, typically complex and relatively small in volume.
  • Data: The manipulated and factual information, typically regular but voluminous.
  • helped reveal the design issues and the rules for information management within the design.

    The Architecture of Knowledge and Databases

    As we are becoming increasingly dependent on formalized semantics to handle large databases, we have to be concerned with the architecture of systems that embody both data and knowledge [Malachi;85], [Missikoff;84], [Shuey;86], [Wiederhold;85a, 85c, [Wiederhold;86a]. The knowledge bases have to address also issues due to the inherent incompleteness of the recorded facts [Winslett;86], [Blum;82b].

    KSYS

    The experience gained in the KBMS research and from other sources is being marshalled for a major subproject, ksys. The other sources include rx, implementing Knowledge Extraction from Databases [Blum;80,86a,86b]. Rx uses frame technology to manage a fairly large database [Wiederhold;81, [Wiederhold;85a]. Additional experience is derived from infobase, developed at DEC [Cherubini;84] and made available to us.

    The ksys project addresses specifically the need for an effective interface between AI Systems and Database Management Systems [Ceri;86]. The knowledge managed by the AI system represents an enhanced form of the meta-data used to describe databases. The DBMS maintains the facts which represent the continuously changing world and permit new inferences to be drawn by the AI systems linked to the database. We have found, however, that the problems in communicating between AI and database systems are not just in building an interface itself. Each side depends on its own representation structures and inference methods [Wiederhold;86a]. In an integrated system the strengths of each must be exploited. This direction reflects a maturation of the research in this field, where we can go beyond existence proofs and consider quantitative measures and architectural feasibility as well.

    The ksys architecture uses:

    1. A frame structure for the knowledge base.
    2. An inheritance scheme which is slot rather than frame based, where each slot is associated with one of an open-ended set of hierarchies.
    3. A minimal interpreter for knowledge interpretation which can be set to search forwards or backwards.
    4. An interface from attribute-defining frames to the relational database.
    5. Persistent storage of all instance information in the database.
    A number of research questions are being raised by such an architecture, and will be investigated as resources permit.

    Inferencing Over a Database

    When databases are large they can contain implicitly more knowledge about a domain than any single expert. To carry out such inferences, a model of the world represented by the database has to be constructed. Such a model involves abstractions and generalizations, and is hence a knowledge base for that domain [Blum;78, 80, 82a, c]. We have used a knowledge based system to control statistical procedure which discover and verify new relationships among the abstractions represented in the knowledge base, achieving a form of automated learning [Blum;82a], [Walker;86].

    To support the learning process it is neccessary to abstract details, stored in the database, to higher-level concepts. By moving to abstractions we can bridge missing data [Blum;81]. If we let the concepts extend over time, we can also start recognizing change, a stronger impetus to action than conventional reports describing a snapshot view of the world [Downs;86], leading to graphic summarizations of a patient's history [deZegher-Geets;88, 88].

    Lexicon Management

    Person: Davidson
    For natural-language systems to provide practical access for database users, they must handle realistic database queries involving terms identifying the stored entities such as names of towns, ships, and people. While formal languages expect the user to ask for, say, ``Size of City = `San Francisco' '' this formulation is unnatural. Natural language systems overcome this problem by having all such identifying names within their lexicons, in addition to the common verbs and nouns.

    However, databases are often quite large and subject to frequent updates. Both of these characteristics render impractical the encoding and maintenance of a lexicon associated solely with the query language processor, which requires separate maintenance,

    We have developed and implemented a technique for reducing the number of lexical ambiguities for unknown terms by deferring lexical decisions as long as possible. The database itself is then accessed to try to resolve the ambiguities and complete the parsing. A simple cost model selects an appropriate method for resolving any remaining ambiguities [Davidson;80].

    Maintenance of Databases

    Before a database can be exploited to yield information, it has to be loaded with data, and these data must be maintained to provide a correct and consistent representation of the real world. The treatment of updates to databases is not yet well treated in theory or practice. Updates to a database may conflict with general integrity constraints, may conflict with previous facts stored in the database, and may involve incomplete knowledge.

    Dealing with incompleteness in databases is a major thrust of the KBMS project. It is motivated by the observation that databases, in practice, do not contain data for all data attributes for all stored entities. This is in part an essential result from the concept of sharing data, and cannot be avoided by exhorting users to enter all data. A database pools data from a variety of sources and each source operates under different preconditions and in a different environment. It cannot be expected that a single person or unit will have the global, detailed knowledge necessary to manage the entire database; rather, each specific user of the database will be qualified to manage only a subset of this large collection.

    Specific areas of our research have included: view management, natural language or ad-hoc planning updates, subset definition, and as discussed earlier, the effect of null values and other forms of incomplete information.

    Formal Semantics of Nulls

    Person: Winslett, Keller
    <% subsection 5.4> Suppose one wishes to construct, use, and maintain a database of knowledge about the real world, even though the facts about that world are only partially known. In the database domain, this situation arises when database users must coax information into and out of databases in the face of null values and uncertainty. Such incomplete information arises most commonly from missing data in the database, but also through other scenarios such as updates through views. A database containing incomplete information represents a set of alternative worlds, one of which represents the real world, as opposed to the single alternative world modeled by complete information databases.

    In the AI domain, the problem of updating a database of incomplete information arises when an agent tries to incorporate new, possibly contradictory information into a database of beliefs reflecting its partial knowledge of the world. In the logic domain, one might choose to represent such a database as a logical theory, and view the models of the theory as possible states of the real world.

    How can new information (i.e., updates) be incorporated into the database? For example, given the new information that ``b or c is true," how can we get rid of all outdated information about b and c, add the new information, and yet in the process not disturb any other information in the database? The burden may be placed on the user or other omniscient authority to determine exactly which changes in the theory will bring about the desired set of models. But what's really needed is a way to specify an update intensionally, by stating some well-formed formula that the state of the world is now known to satisfy and letting the database management system automatically figure out how to accomplish that update.

    Alternatives for semantics of databases with incomplete information and the ramifications of these semantics have been summarized in a paper which won the student award at the IEEE Data Engineering conference in Los Angeles, 1984 [Keller;84a, 85b].

    We have explored a technique for updating databases containing incomplete information. Our approach embeds the incomplete database and the updates in the language of first-order logic, which we believe has strong advantages over relational tables and traditional data manipulation languages when information is incomplete. We have described semantics and found polynomial-time algorithms for update operators [Winslett;86, 85b]. An implementation of these algorithms has been coded in C. When null values are present in the database or in the updates, incorporating the updates directly in the database may lead to unacceptable expansion in the database size. Using methods reminiscent of those used in concurrency control, we have investigated a lazy evaluation technique which regulates this growth.

    Subset Definition

    Person: Wiederhold
    The subset concept, formalized within the structural model, permits the definition of subrelations which can be used to structurally define classes of entities where data entries would not be appropriate, thus reducing the need for null management. Since use of this concept leads to a large number of conceptula relations, in practice we will map most subsets kept on one processor system into a single file. Control of such mappings is best carried out at a knowledge-based level [Wiederhold;83a].

    To provide access to a specific subset, we may define a database view. Use of defined subsets cannot resolve all cases where nulls will occur.

    Update Constraints

    Person: Keller
    <% subsection{4}2.1> The structural model permits the description of constraints which is implementation independent. Most current systems have only incomplete validation facilities and require that programmers provide procedures which reject inappropriate updates. We have shown that the rules of the structural model provide a basis for efficiently validating updates prior to submitting them to a DBMS [Keller;81a]. If assurance is provided that the rules of the structural semantics have been obeyed at execution time, optimal query paths can then be selected while guaranteeing that the query result will not be affected. <>

    Updating through Views

    Person: Keller
    <% subsection{4}2.2> The form, content, and availability of a large database are never completely known to any single user. A user is typically restricted to a view or a set of views which define a window onto the database appropriate to the users requirements and authority. This restriction becomes yet stronger when working at remote nodes in a distributed network, since a user at any single site can certainly not be expected to have a global view.

    These restrictions lead to a class of problems when the database is to be updated. An update expressed through a view typically has multiple reasonable interpretations in the shared database. For that reason, in most current DBMSs, views can only be used to retrieve information; users operating within a view are not permitted to do updates. This restriction increases the burden on the centralized authority with power to update the database, and introduces errors due to separation of authority over data and authority over database operations.

    We have developed a theory which permits enumeration of all possible interpretations of a view update, and a complete collection of algorithms for the implementation of these interpretations. The theory is based on notions of formal correctness of updates [Keller;82, 85a, c].

    When the view includes the join columns, view update is generally feasible, especially when the key for the view is entirely contained in one of the underlying relations [Keller;81b]. This work has been extended to include other views formed by standard relational operators (selection, projection). This algorithmic approach is in contrast with the information-theoretic approach based on minimal complements [Bancilhon;81], [Keller;84b].

    In addition, we have developed an interactive approach to permit a database administrator to choose the appropriate interpretation for updates at view definition time. Once the interpretation has been fixed, an update expressed against a view will have an unambiguous interpretation in the shared database [Keller;86a].

    Natural-Language Database Updating

    Person: Davidson
    <% subsection{4}12> Problems conceptually similar to view updates arise in processing natural language queries. The difficulty here is that casual users of a natural-language system do not understand the scope nor the details of the underlying database, and so may make requests that:

    1. are reasonable, given their view of the domain, but are nevertheless not possible in the underlying database;
    2. are ambiguous with respect to the underlying database; or
    3. have unanticipated collateral effects upon the responses to earlier questions or upon alternative views of the database.
    The view concept here is a dynamic issue, not subject to predefinition by a database administrator.

    Although considerable research has been devoted to the problem of processing queries expressed in natural language, the issues and techniques for performing natural-language database updates have had little attention. The need to perform natural language updates arises when computer databases support ad-hoc planning. When using the database to plan for hypothetical future scenarios, update interpretation should not depend on the insights of a database administrator. For this situation we have developed artificial intelligence techniques to select a plausible update interpretation based on the user's previous interactions with the database, database semantic constraints, and the goal of minimal side effects. If one update interpretation clearly dominates, the user can proceed in the planning scenario without being forced into a tedious disambiguation dialog.

    A theory has been developed that makes it possible to identify a particular user's view of the database and to determine the legality, ambiguity, and potential side effects of updates expressed in natural language. A formal analysis of the problems associated with updates expressed on views (database submodels) is central to this work. A system, pique, has been implemented, based on this approach, that processes natural-language updates, explaining problems or options to the user in terms the user can understand and that, moreover, makes changes in the underlying database with a minimal disruption of other views [Kaplan;81a] [Davidson;82, 83a, 83b].

    Finite Differencing

    Person: Paige
    <% subsection{4}18> We have cooperated in the investigation of the transformational technique of finite differencing for the specification, implementation, and application of derived data in the context of a set-theoretic entity-relationship data model. A framework for the automatic maintenance of derived data has been presented. In this framework repeated costly computations are replaced by more efficient incremental counterparts. Applications of this approach are in integrity control and in query and transaction optimization [Paige;82, 83a, 83b], [Goldberg;83].

    Maintenance of Distributed Databases

    Person: Garcia
    <% subsection{4}19> The management of redundant, distributed data can seriously affect system performance. We have analyzed and developed algorithms for integrity maintenance [Garcia;77 ... 81]. The use of hole-lists to inform nodes of update status of transaction while passing a minimal amount of information has been shown to be effective [Garcia;79e]. The analyses has also shown that it is difficult to improve on the performance of centralized control methods if sufficient backup can be provided in the responsible nodes [Garcia;81].

    The separation of read-transactions into three classes, namely those that require no or minimal consistency, those that require auditable consistency, and those that require time-critical consistency, can improve aggregate system performance. The problems arising from serving transactions that support planning functions, which require access to large granules of the database, can be greatly reduced by lowering their consistency demands [Garcia;82].

    Distributed Resiliency Mechanisms

    Person: Minoura, Apers
    <% subsection{4}20> An implied, but rarely realized, benefit of having a distributed database is increased system availability in case of failures. Data availability can be accomplished via replication. Recovery of a node which fails disastrously requires repair and updating of the database at the failed node to achieve consistency.

    Such a system is considered resilient with respect to failures, rather than failure-proof. A local node may not be able to provide full local recoverability by itself. This can be due to economic considerations, since the cost of local secure redundant storage can be excessive for small systems, or can be due to operation in a high-risk environment. Techniques to use other nodes within the distributed network to restore failed nodes, whose local storage has been damaged, have been explored and documented [Minoura;78, 82, 83a, b]. We are obtaining feedback about their use in financial and manufacturing systems.

    Subsequent work in this area deals with a classification of transactions in relation to resiliency. By formally defining a set of transactions which can proceed when the database is partitioned, a set which can be conditionally handled, and a set which cannot proceed, we expect to enhance the operability of distributed databases in conditions of adversity [Apers;84a,d].

    Querying Databases

    To obtain information from a database, it must be queried. High-level interaction with databases by untrained personnel using natural language should be possible. We have applied front-end processors for natural-language queries based on previous work as ladder and ida [Sagalowicz;77] and [Hendrix;77].

    We have extended the approaches to natural language understanding addressed in earlier work [Gardner;81]. The intent here was to avoid reinventing the wheel, so that we can build on research performed earlier and study the many remaining problematical issues in regard to the human interaction with database [Rowe;82c].

    When querying issues of incomplete knowledge by the users arise, the structural model can provide guidance in query formulation and in response processing. We also tested the approach to response analysis developed by [Kaplan;79a, b, f, 80].

    Using the Graph Model of the Database

    The structural and similar semantic models can be exploited to obviate the need for joins for all those connections which are clearly understood. The concepts of ownership, reference, and subset identify ISA hierarchies which can be exploited to locate attributes at higher levels of abstraction. This means that queries giving only attributes, and not relation names, can be understood and processed according to the semantic model, as defined in gordas [ElMasri;81]. The computational effort appears to be much less than that required to deduce dynamically those relationships that are based on dependencies in a universal-relation model.

    Cooperative Interactive Interfaces to Databases

    Person: Kaplan, Davidson
    <% subsection{4}8> In an environment where a user may have incomplete or incorrect knowledge of the structure of available data, queries or updates may be posed that are either unanswerable, impossible to perform, or are otherwise misguided. Our work has shown that such misconceptions can often be detected directly from the phrasing of the user's request and a knowledge of the structure of the database [Kaplan;80]. This information can be used to correct the user's misconceptions and inform the user about the nature of the database and database system. The usefulness of our techniques for cooperative processing of transactions has been demonstrated in a library environment [Corella;84]. These techniques need to be extended to distributed environments, where the knowledge necessary to aid users in use of the database may partially reside at remote sites.

    Descriptive Responses to Database Queries

    Person: Kaplan, Davidson, more
    <% subsection{4}9> The typical response to a database query is to list the set of items from the database that satisfy the conditions presented in the query. This list can be excessively long and consequently may be inappropriate for a conversational system. Often, a more appropriate response to such queries is a description of the set, rather than a listing of its elements. For example, the response ``All corporate officers'' may be more concise and informative in response to ``Which employees profit share?'' than a list of $1\,000$ names. Practical techniques for producing a significant class of such responses >From existing database systems without using a separate world model or knowledge base have been been implemented [Kaplan;81b, 82a, b].

    Information from the structural model can provide useful hints for summarization. Since a referenced entity provides a higher level abstraction of the referencing objects, the typically small set of reference connections can provide the candidate relations for descriptive responses.

    Task Models

    Person: Davidson, Rowe
    <% subsection{4}10> Certain entities in a database--be they fields, attributes, relations, or more complicated constructs--have certain probabilities of being required in an interrogation of the database. These probabilities are dependent on the particular task a user may be performing and his or her focus and interests [Grosz;77a,b].

    By incorporating a task model into the KBMS, these characteristics can be used in a variety of ways to improve the utility of the system. Four main uses of task models have been explored in a KBMS:

    1. inclusion of relevant information not explicitly requested in the response to a query;
    2. organizing responses so that the more interesting items are presented first;
    3. checking for semantic irregularities in the performance of the task; and
    4. prefetching of items and fields that have not yet been requested but are likely to be in the near future [Rowe;82d, 83c] [Davidson;82].
    We found that tracking the user's focus dynamically during a single session to be beyond our capabilities.

    Semantic Query Optimization

    Person: King
    <% subsection{4}13> A request for information can often be formulated in more than one way, depending on knowledge about the subject domain and the ingenuity employed in determining the best access path to the desired information [Martin;77]. A question about all ships currently carrying iron ore, for example, can be answered by looking only at information about bulk ore carriers, assuming that it is known that only bulk ore carriers carry iron ore.

    Semantic query optimization is an approach to improve the performance of information requests that uses such domain knowledge automatically. The objective is to transform a query into a semantically equivalent one (i.e., one that produces the same answer but can be processed more efficiently).

    This work demonstrates that semantic knowledge about the database can be used in a new way to achieve efficient data retrieval. The method supplements conventional query optimization methods. Drawing on semantic, structural, and processing knowledge, it makes use of the heuristics of query processing developed in prior research. A system, quist, has been implemented to explore this approach. Improvements have been demonstrated in a range of query types by means of transformations that add or delete constraints on attributes or even entire files, as warranted by the database and query structure. In other cases, the system detects unsatisfiable query conditions without inspecting the database, or aborts inference when it determines that no improvement can be expected. Analysis overhead is low in all cases [King;79, 80a, b, 81a, b].

    Common Expression Analysis

    Person: Finkelstein
    <% subsection{4}20> Another optimization task depends on expressions and their results from prior queries. We have shown that common expression analysis can be applied in three different settings for optimization of database request. The query graph representation of queries was originated for this work. It represents the structure of requests in an intuitively clear manner. Also the decomposition into nodes (relation occurrences in queries labelled with internal selection predicates and projected attributes) and edges (join predicates) serves as a convenient structure for automatic analysis in query optimization.

    We have written a program that detects whether or not the result of one query can be used to support answering another query. Finkelstein describes both the query graph and the first two settings for optimization described below [Finkelstein;82].

    1. In an interpretive system, answers to previous queries, or temporaries formed in the process of obtaining them, may significantly reduce processing time for subsequent queries. We have defined a methodology, based on the query graph representation, for deciding cheaply whether such temporaries are helpful.
    2. A transaction which contains multiple queries may perform redundant database operations. We have described a procedure for optimization of a collection of queries, which involves computing least upper bounds on query graph nodes (which correspond to scans through relations which might profitably be combined). These are extended to maximal subexpressions between queries, and a heuristic procedure examines possible query revisions to find the best execution procedure.
    3. We have extended the concept of optimization of generated code to languages with embedded queries (such as sql in pl/i). We have analyzed how queries in loops can be optimized differentially. Also, we have studied how global flow analysis can be used to determine available database expressions at places in the program.
    This work is now being applied within the IBM San Jose Research Laboratory to prototype relational database systems.

    Menu-Guided Cooperative Ad Hoc Query Interface

    Person: Sang Cha
    Ad hoc query interfaces based on a linear syntax language burden the end users with the need of remembering the precise syntax and the semantics of the query language as well as the structure of data. This is true with not only those based on formal languages like SQL but more natural yet still artificial ones that have been supported in many natural language interface systems. While the natural language interfaces were initially motivated to get around the difficulties of using the formal ones, their success has been limited by their incomplete linguistic and domain knowledge. The imperfection of a natural language interface is more problematic to the end users, causing psychological interference between the user's habitual language and the system's accepted language. Recent work on transportable natural language interface systems provides user-friendly domain knowledge acquisition tools on top of a structured knowledge base. Such tools help to reduce the end user problem by making it easy to add the domain knowledge, but they make only a partial solution in face of an indefinite amount of system knowledge required for handling open-ended user behavior. Our research addresses the presence of the problem more in the passive and uncooperative system behavior. We propose an active ad hoc query interface system that is cooperative at the early stage of the user's query composition. This also makes our approach different from that of other cooperative systems based on post-query corrective/informative response. To implement the early cooperative response, we borrow the idea of menu-guided query creation from a so-called menu-based natural language interface NLMenu of Texas Instruments, Inc., in which uses the users are guided by a pseudo-natural query language grammar. With the menu guidance, a query is constructed incrementally through a multiple steps of menu interaction. At each step, a new menu state is predictively generated by the system for the next legitimate increment to a partially constructed query. As the user chooses one of the presented choices, the system proceeds to the next step. In addition to the grammatical knowledge of a query language, the system needs structured domain knowledge to make the menu guidance a cooperative response. We found the following knowledge is essential: (1) the conceptual knowledge over a domain, which allows to talk about objects and legitimate relationships among them and; (2) the domain and integrity constraint knowledge for pruning further the choices leading to intensional failures. The lack of such knowledge not only may lead to either a failing or a meaningless query as experienced in NLMenu but also reduces the effectiveness of menu interaction by flooding the menu space with irrelevant choices and thus increasing the user's choice search time.

    Compared with conventional natural language interface systems, our approach is based on a normative system assumption. The system works with a relatively small-sized grammar, vocabulary, and knowledge base. Besides, they can be made to grow incrementally.

    To demonstrate the effectiveness of the menu-guided cooperative ad hoc query interface described in this section, we plan to implement it on a XEROX lisp machine with a SQL DBMS configured on a remote server. The idea of menu-guided cooperative system carries over to formal query language interfaces as well. In addition to testing over a small subset of natural language grammar, we plan to create a menu-guided cooperative interface for a subset of SQL. We expect such SQL interface to be useful to casual and advanced database users, while the interface based on a natural language to be suited for wide range of users.

    Use of a Statistical Abstract of a Large Database

    Person: Rowe
    <% subsection{4}14> The size of data sets subjected to statistical analysis is increasing as computer technology becomes more sophisticated [Wiederhold;84c]. It is an attractive alternative for analysis to have quick estimates of descriptive statistics rather than exact values obtained with considerable delay. We have demonstrated a new technique for estimating database statistics that is a top-down alternative to the bottom-up method of sampling. This approach precomputes a set of general-purpose database statistics (a database abstract) and then uses a set of approximately 400 inference rules to make bounded estimates of other, arbitrary statistics requested by users.

    The inference rules comprise a new example of an artificial-intelligence expert system. There are several important advantages of this approach over sampling methods, as has been demonstrated experimentally in estimating a variety of statistics on two different databases [Rowe;81b]. [82b, d, e, f, 83a, b, d].

    Applications

    <% section 8> In order to gain experience with our concepts and validate their applicability, we also carry out research in application areas. We have obtained databases of merchant ships, of computer circuits, and of chronic disease patients to support this work.

    Experiments with Database Technology to Support VLSI Design

    Person: Milton, Beetem, Winslett
    The implementation of a large and complex device requires a massive design effort. The length of the design cycle of VLSI non-regular devices is often critical. For a complex device, approaches involving only a single designer are either too slow or make so many compromises that they utilize chip area very poorly. If instead, a team of designers work together on a project, they will require much support for communication and task scheduling. We have applied our techniques to manage databases to aid the VLSI design process. The database is used here as a communication medium among people working at different levels on the same object, namely the same device or system. This use of a database is in contrast to current design automation aids which use task specific files without automatic feedback of analysis results or decisions made by the designers.

    Logic and circuit design information of two devices, an ALU and a pdp-11 CPU, and the component library required to build them, have been placed into databases. The lower level data elements can be automatically generated using an existing design automation program. Performance measurements indicated only a performance degradation by a factor of about 2 versus use of specialized design files [Wiederhold;80b]. Modification of lower level elements during the design process is signalled automatically, using a height-first algorithm, to the related parent levels, so that this detailed knowledge can be incorporated in the higher level abstraction when these are accessed during successive design iterations [Beetem;82], [Wiederhold;82b].

    We have developed some access techniques for the design data which permit retrieval of data that is partially stored. If required elements are not stored, they are generated from higher level circuit or logic specifications. Partial storage is important. The volume of lower level elements in a VLSI design can be greater than is manageable by the largest storage devices now available, so that automatic methods for VLSI design automation will need access to a mix of generatable, regular elements and instantiated, irregular elements if they are to handle large circuits that are not totally regular [Wiederhold;82b].

    Our initial experiments utilized a codasyl database system, DBMS-20. We are in the process of evaluating our approaches for other database environments to make our results more accessible to practitioners. We have ported the database to a relational and object oriented system at XEROX Palo Alto Research Center and evaluated the performance of the VLSI-design programs in both a purely relational and an entity-relationship model implementation [Winslett;84, 85a]. The entity-relationship model improved performance, but the overriding factor in both versions was the cost of writing the results to disk.

    An important byproduct of this work is an assessment of the tradeoffs among database implementation models. Much argument has been created in the arena of relational versus network implementations. We believe that some solid experiments in this one critical area can help in quantizing the issue in an objective fashion [Milton;83].

    We have begun to consider the issue of the automatic management of design versions. This task is performed now by databases that support engineering change control [Olumi;83]. The same problem is being addressed in conventional databases that are concerned with time or planning updates, but there are yet few results to draw upon, so that in fact the design area may be able to lead in this research field [Belady;77] , [Bennett;82], [Parnas;76], [Rochkind;75]. The high capacity permanent storage provided by optical disks permits a greater retention of past versions [Rathmann;84].

    Planning

    Person: Pednault
    <% subsection{4}16> Databases are important resources for planning. Much planning is based on extrapolation from past events and includes hypothetical data defining alternate future scenarios. The intrinsic complexity of planning achievement of a goal from a current state is high.

    In research performed at the SRI International AI Center theoretical issues in planning are being investigated. A method which permits plans to be created, although not necessarily executed, efficiently is the focus of current research [Pednault;84], [Manna;86]. The method can be related to many techniques which had a heuristic base.

    Experiments in Intelligent Data Analysis (RX)

    Person: Blum
    <% subsection{4}17> If database analysis can deal effectively with both past facts and multiple future scenarios, its use as a planning tool will be greatly enhanced. There remains a need for improving the corresponding analytical support capability of planning systems. Many of the critical data which are found in large systems appear in a time sequence. Our existing systems have not dealt well with the special demands stemming from the temporal relationships among the data. In related projects (acme, aamrs, aramis, rx) we have had the opportunity to structure, collect, and analyze large medical databases. In such databases, for instance tod, time is a key attribute [Weyl;75]. Some of this work was performed at a system level [Wiederhold;80a, c, 81a, c, 82c, f], [Kuhn;82, 84], while other work analyzes the content statistically [Blum;81].

    Content analysis requires large collections of well maintained data. Many years of operational use were required to get to this stage. In this research we have developed tools that use artificial intelligence techniques to assist in the processing of time-oriented data. A frame-based model is used to capture domain knowledge. This model assists in combing the database for unexpected time-lagged correlations and reporting such events as prospective hypotheses. If analysis of the finding appears warranted, a statistical model is built that includes all possible covariates given in the knowledge base.

    The appropriate statistical tests are selected by applying a collection of rules that use the descriptions of the data from the database schema and the conditions that are satisfied by the statistical procedures. If the findings turn out to be significant, they are added to the knowledge base for use in another iteration of hypotheses generation and testing. This approach to knowledge generation complements the familiar, but less formal approach taken by expert systems [Shortliffe;73, 76, 79].

    This approach has been demonstrated by using a small subset of a time-oriented rheumatology database (aramis) [Fries;79]. The statistical methods now available are limited to multi-variate regression. Tentative hypotheses were generated and automatically tested for statistical significance. Several verified hypotheses could be added to the original loaded knowledge base [Blum;82a, b, c].

    Work is continuing in the medical arena, but its applications to other databases (e.g., military manpower management) are also under way. Basic research to deal with reduction of the inherent complexity of hypotheses generation is continuing.

    Tools

    <% section 9> As part of the research we have developed software tools. Some of these have had broader applicability and have been used as prototypes for commercial and industrial database modules. We are further investing in this area again, partially so that we may rapidly build demonstration systems, but also to demonstrate that future database and knowledge base systems can be effectively assembled from standard modules.

    Modules developed and under consideration include a symmetric file access system, a database definition language processor, a relational algebra processor, and recovery support. We hope to integrate these on the naxos system, a vax 750 dedicated to experimentation with advanced databases.

    File Access System

    Person: Allchin, Qian, Keller
    A file access system that uses symbolic keys to access variable length records based in pascal and supporting several host languages, including pascal, interlisp, and fortran has been developed and tested. The services that this system, named flash, expects from the underlying operating system are limited to:
    1. directory management for named segments of secondary storage, and
    2. access to fixed size blocks or pages of these segments.
    In a multi-user environment some locking facilities are also needed. Since this subproject is the basis for long-range database system development, reliability and efficiency have been major design and implementation objectives. It is specifically designed to provide strong and symmetric support facilities for databases, so that powerful database systems can become easier to implement than they are using conventional files designed with only programmer's needs in mind. The underlying structure uses B+ trees for storage of both primary and secondary keys [Allchin;80]. This system will be used to study various dynamic storage and retrieval strategies. The experience of implementing flash has been used to define an Input-Output package for the Ada language [Keller;86c].

    While traditionally the notions of primary key have implied both a high degree of locality as well as uniqueness of key, it is desirable for a database-driven system not to make that distinction. Included in that development is a port to dec vax equipment. The available unix software is particularly deficient in the area of file management.

    Intelligent File Systems

    Person: Shih
    <% subsection{4}22> The understanding we have developed in the file system area leads us to consider hardware implementations for these facilities. The processing capability of modern disk controllers is such that a substantial fraction of a file system could be provided outboard of the main processors.

    Such an intelligent file server would respond to high level queries. Whereas conventional equipment requires the processor to specify a device and a storage address for retrieval, we foresee having to submit only a symbolic name and a key. The objects to be retrieved will no longer be fixed-length blocks, but sets of variable length records.

    The difficulties of moving advanced file functions out of the processors realm are in the respecification of the interface. Linkages for serial access within the retrieved sets, algorithms for memory allocation and for setting and managing locks for concurrent access all need analysis and formalization.

    Database Machines

    Person: Shaw
    <% subsection{4}23> One focus of research activity within the KBMS project has involved the design and analysis of alternative hardware machine architectures oriented toward the very rapid execution of the relational operations which arise frequently in large-scale database management applications. This research has yielded certain surprising and potentially important results which may ultimately prove useful in designing cost-effective, extremely high-performance database machines [Shaw;79].

    These results have led to a ``high-level'' design for a specialized non-von Neumann machine, portions of which would be implemented in VLSI, which would support the highly efficient evaluation of the most computationally complex operators of a relational algebra, specifically projection with elimination of duplicate tuples and the equi-join. Based on a hierarchy of associative storage devices, this architecture permits an O(log n) improvement in time complexity over the best known evaluation methods for these operators on a conventional computer system, without the use of redundant storage, and using currently available and potentially competitive technology. In many cases of practical import, the proposed architecture should also permit a very significant improvement (by a factor roughly proportional to the size of the primary associative storage device) over the performance of previously implemented or proposed database machine architectures based on associative secondary storage devices [Shaw;80a, b, 81]. This work is the basis for further VLSI data-oriented architectural research in progress at Columbia University.

    Access to Optical Storage

    Person: Rathmann
    <% subsection{4}24> Optical storage technology promises to greatly increase the capacity and reduce the cost of data systems. Optical storage comes with media which are either:
    1. created from master disks and can only be read subsequently
    2. written once by appending blocks of data, but never erased
    3. erased and rewritten, similar to magnetic storage.
    The type of optical storage which we consider in particular uses optical disks which are writable once, but can be read often. It is these devices that can serve data-processing needs because of their incremental writability and their high storage density.

    While research on rewritable optical media is progressing, it seems that the linear density will be reduced by at least a factor of 2, leading to reduction in storage capacity by a factor of 4.

    With this development, new problems have to be addressed. At this time we recognize that:

    1. more data will be replicated because storage costs are dropping more rapidly than communication costs
    2. more versions of past data will be kept
    3. the index mechanisms used for smaller, rewritable devices are not directly applicable to this medium.
    While we are addressing the problem of replication in general, the problems of indexing and access to versions has our particular attention. In our databases we find that the space occupied by indexes can exceed the storage space used for the data. Traditional indexing techniques depend on wholesale or incremental update of the index trees. This means that escaping to magnetic storage for index maintenance will create other restrictions. Either past versions will not be indexed, so that that they are lost for practical purposes, since scanning of these high volume storage devices will take many hours, if not days; or the granularity of indexing will be high, so that one retrieval must fetch millions of characters, which must then be post-processed.

    We have developed a technique where indexing structures, in particular tries and various kinds of trees [Knuth;73], are stored incrementally within the data blocks as they are rewritten [Rathmann;84]. This appears to solve the basic problem, although further investigations are foreseen. The trie approach has been implemented [SYTH;85], in an optical version of flash, a language independent portable file access system [Allchin;80a].

    Conclusion

    Our work on Knowledge-Based Management Systems, the Management of Distributed Knowledge, and Management of Knowledge and Databases at Stanford and SRI International has demonstrated that substantial improvements in the utility of database systems result from the application of techniques borrowed from Artificial Intelligence. This work also incorporated considerations of distributed semantics. The incorporation of higher-level semantic knowledge into database systems has been shown to help with the design, improve the manageability, and increase the utility of data in a variety of ways. Upgrading databases from passive systems which mainly store information to active systems which understand information, by incorporating knowledge of the data and the domain into the system itself, is a natural and neccessary precondition for growth in such systems. Our research conclusions can be summarized as follows:
  • Traditional, algorithmic techniques provide a solid foundation for work in databases, but the models used cannot cope adequately with the complexities and the scale of systems of realistic proportions.
  • The techniques developed in artificial intelligence research apply well to the database world because of the intrinsic formal abstraction provided by databases in their modeling of the real world. Artificial intelligence research has been most effective where it has focused on highly structured domains that are subject to simplified abstract descriptions. Databases provide a realistic example of such a domain.
  • Within the KBMS framework we have been able to develop innovative approaches to modeling the database needs of diverse user groups and integrating various user database models; improving query processing through the use of task models, semantic query optimization, common subexpression analysis, and giving descriptive responses; as well as dealing with issues of updating large databases, including problems arising due to incomplete data, incomplete knowledge in the users view, and natural-language database updates.
  • To support this research we have developed tools, such as portable file-access systems, database technology to support VLSI design, designs for database machines, and methods for optimal design of centralized and distributed physical databases.
  • Our work is finding applications in the design and operation of information systems in the world outside the laboratory. Moreover, it is beginning to contribute to tasks being undertaken by NASA, organizations involved in military manpower planning, and other institutions, including applications for manufacturing, medical, economic planning, and financial systems [Ghosh;79], [Wiederhold;79d].
  • Our work has been characterized throughout by a conviction that future systems will have to deal with large data and knowledge bases. In such environments a user has only a limited view from which to manage the database and infer new information for decision-making. Views may be physically related to distribution of storage. We are now synthesizing and formalizing the results from our studies. To integrate the concepts developed we have chosen an approach which permits distinct maintenance of data and knowledge, while at the same time integrating the inference processes. To store the knowledge we are employing frames, while the databases uses a relational DBMS. Efficiency issues are being addressed through dynamic binding of information as it is being brought into memory.

    Conceptual~Interactions~of~KBMS~Components

                             ---------------------------
                            |           Users           |
                             ---------------------------
                       Free form||               /\             /\ data
                        queries ||               ||             || flow
                          and   ||               ||
                        updates ||               ||
                                \/               ||              ^ control
        -------       ------------------         ||              | flow
       |       |====>| Natural language |        ||
       | User  |     |   front end      |        ||
       |       |      ------------------         || Responses
       | Focus |           |  |   ||             ||
       |       | Lexical   |  |   || Knowledge   ||
       |       | data      |  |   || base        ||
       |       | inter-    |  |   || queries     ||
       |       |  action   |  |   ||             ||  
       |       |           v  |   \/             ||  
       |       |      -----------------     -------------      ------------   
       |       |     |      Domain     |   | Cooperative |    |  Database  | 
       |       |<====|     Knowledge   |===|  response   |<===|  Abstract  |  
        -------      |       Base      |   |  analysis   |    |            |  
                      -----------------     -------------      ------------   
                        ^  |  |   ||             /\  ^                 /\^      
                Organi- |  |  |   || Database    ||  | Access          ||      
                zation  |  |  |   || queries     ||  | path            ||      
                advice  |  |  |   ||             ||  | information     ||      
                        |  |  |   ||             ||  |                 ||      
     -------------\     |  |  |   ||             ||  |                 ||      
     |    Data  |  \    |  |  |   ||             ||  |                 ||      
     |    View  |   \   |  |  |   \/             ||  |                 ||
     |   Models |    \ ------------------------------------            || 
     |     of   |\         Structural Schema               |           ||      
     |  Applica-| |  / ------------------------------------            ||      
     |   tions  | | /   |  |  |   ||             ||  ^                 ||      
     |          | |/    |  |  |   || Data        ||  |                 ||      
     ------------/|     |  |  |   || base        ||  | Access          ||
                  |     |  |  |   || queries     ||  | path            ||
         Structural     |  |  |   ||             ||  | information     ||      
        information     |  |  |   ||             ||  |                 ||      
                        |  |  |   \/             ||  |     ----------  ||   
                  ------------------------------------    | Access   | ||    
                 |        Access Path Organizer       |<=>|statistics| ||    
                  ------------------------------------    |  file    | ||     
                        |  |  |   ||             /\        ---------   ||    
            directives  |  |  |   || queries     ||                    ||
                        |  v  |   \/             ||                    ||  
      -------------------    -----------------   ------------------    ||   
     | Modern File access|  | Commercial DBMS | | VLSI based hard- |   ||    
     |  software, FLASH  |  |   a.o.CODASYL   | |   ware concepts  |   ||    
      -------------------    -----------------   ------------------    ||   
                                  ||             /\                    ||     
                                  ||             ||                    ||     
                                  ||             ||                    ||     
             -------------------------------------------------------------
            |               DATA BASES                                    |     
            |  1: Merchant ships, ports, cargo, voyages                   |     
            |  2: Logic of a DEC-11 machine, circuits, components         |     
            |  3: Time-oriented data from Arthritis patients              |     
             -------------------------------------------------------------     
    

    Stanford Reports and Papers

    These books, papers and reports document research performed fully or partially under current and preceding ARPA contracts to the investigators collaborating in the KBMS project.} <% KBMS References , Maintained by Peter Rathmann> <% Homebase is [SCORE]KBMS.REFERENCES> <% used by KBMS.TEX and KBMS.PAPERS>
      Entries with `*' are KBMS documents, others are anciliary documents.
    1. [Allchin;80a]* {J.~Allchin, A.~Keller, and G.~Wiederhold: ``flash: A Language-Independent Portable File Access System''; Proceedings of the ACM-SIGMOD Conference, May 1980, pp.151-156.}
    2. [Allchin;80b] {J. Allchin: ``Modula and a Question of Time''; IEEE Transactions on Software Engineering, vol.SE-6 no.4, July 1980, pp.390-391.}
    3. [Apers;84a]* {Peter M.G. Apers, and Gio Wiederhold: ``Transaction Classification to Survive a Network Partition''; Stanford CS report STAN-CS-85-1053, August 1984.} <%submitted to ACM TODS, August 1984.>
    4. [Apers;84b]* {Peter M.G. Apers, and Gio Wiederhold: ``Expert Database System in a Loosely Coupled Environment''; Proc. First Workshop on Expert Database Systems, Kiawah Island, South Carolina, Oct.1984, Institute of Information Management, Technology and Policy, Univ. of South Carolina, vol.2, pp.611-617.}
    5. [Apers;84c]* {Peter M.G. Apers, and Gio Wiederhold: ``Transaction Handling in a Loosely Coupled Environment''; Proc. ACM Int. Conf., Florence Italy, 1985, North-Holland Pub.}
    6. [Apers;85] {P.M.G. Apers and G. Wiederhold: ``Transaction Handling in a Loosely Coupled Environement''; in Computing 85: A Broad Perspective of Current Developments, G. Bucci and G. Valle (eds), North Holland 1985, pp. 27-36.}
    7. [Arvidson;85] {Raymond Arvidson, Gio Wiederhold, et al: Issues and Recommendations Associated with Distributed Computation and Data Management Systems for the Space Sciences, volume 2; Committee on Data Management and Computation, Space Sciences Board, National Academy of Sciences, January 14, 1985.}
    8. [Banks;85] {Peter M. Banks (chairman SESAC Task Force on Scientific Uses of the Space Station): Space Station Summer Study Report; Michael Wiskerchen and Gio Wiederhold: ``Section{5}2: IOC Facility and Operations Requirements: Communications and Information Systems''; NASA, Washington DC, March 21, 1985.}
    9. [Barr;80] {A. Barr and J. Davidson: Representation of Knowledge; Stanford University CS Report CS-80-793, March 1980; also in The Handbook of Artificial Intelligence, vol I, A.Barr and E.Feigenbaum (eds), 1981.}
    10. [Barsalou;87a]* {T. Barsalou, W.A. Moore, L.A. Herzenberg, and G. Wiederhold: ``A database system to facilitate the design of {FACS} experiment protocols (abstract)''; Cytometry, 97, August 1987.}
    11. [Barsalou;87b]* {Thierry Barsalou and Gio Wiederhold: ``Applying a semantic model to an immunology database''; In W.W. Stead, editor, Proceedings of the Eleventh Symposium on Computer Applications in Medical Care, pages 871-877, IEEE Computer Society Press, Washington, D.C., November 1987.}
    12. [Barsalou;88a]* {Thierry Barsalou: ``An object-based architecture for biomedical expert database systems''; In R.A. Greenes, editor, Proceedings of the Twelfth Annual Symposium on Computer Applications in Medical Care, pages 572-578, IEEE Computer Society, Washington, D.C., November 1988.}
    13. [Barsalou;88b]* {Thierry Barsalou and Gio Wiederhold: ``Knowledge-based Mapping of Relations into Objects''; Computer Aided Design, Butterworths, May 1987.}
    14. [Barsalou;89a]* {Thierry Barsalou and Gio Wiederhold: ``A Cooperative Hypertext Interface to Relational Databases''; Stanford Knowledge Systems Laboratory Report KSL-89-18, March, 1989}
    15. [Barsalou;89b]* {Thierry Barsalou, R. Martin Chavez, and Gio Wiederhold: ``Hypertext interfaces for decision-support systems: A case study''; to be published in Proceedings of {MEDINFO'89}, IFIP, Beijing, China, October 1989.}
    16. [Beetem;82]* {Anne Beetem, Jack Milton, and Gio Wiederhold: ``Performance of Database Management Systems in VLSI Design''; IEEE Database Engineering Bulletin, vol.5 no.2, June 1982, pp.15-20.}
    17. [Blum;78]* {R.L. Blum and G. Wiederhold: ``Inferring Knowledge from Clinical Data Banks Utilizing Techniques from Artificial Intelligence''; Proc. 2nd Symp. on Computer Applications in Medical Care, Nov.1978, Washington DC, IEEE, pp.303-307.}
    18. [Blum;80]* {R.L. Blum: ``Automating The Study of Clinical Hypotheses on a Time-Oriented Database: The rx Project''; MEDINFO 80, Lindberg and Kaihara(eds.), IFIP, North-Holland, 1980, pp.456-460.}
    19. [Blum;81]* {R.L. Blum: ``Displaying Clinical Data from a Time-Oriented Database''; Computers in Biology and Medicine, vol.11 no.4, 1981.}
    20. [Blum;82a]* {R.L. Blum: ``Discovery, Confirmation, and Incorporation of Causal Relationships from a Large Time-Oriented Clinical Database: The rx Project''; Computers and Biomedical Research, vol.12 no.2, pp.164-187, 1982.}
    21. [Blum;82b] {Robert L. Blum: Discovery and Representation of Causal Relationships from a Large Time-Oriented Clinical Database: The rx Project; Lecture Notes in Medical Informatics, no.19, Lindberg and Reichertz, (eds.), Springer-Verlag, New York, 1982, 242 pp.}
    22. [Blum;82c]* {Robert L. Blum and Gio C.M. Wiederhold: ``Studying Hypotheses on a Time-Oriented Clinical Database: An Overview of the rx Project''; Proc.of the 6th Symposium on Computer Applications in Medical Care, Oct.30-Nov.2 1982, Washington, DC, IEEE 82 CH1805-1, pp.725-735.}
    23. [Blum;82d] {Robert L. Blum: Discovery and Representation of Causal Relationships from a Large Time-Oriented Clinical Database: The rx Project; Springer Verlag, Lecture Notes in Medical Informatics no.19, 1982}
    24. [Blum;86a] {R.L. Blum: ``Computer-Assisted Design of Studies Using Routine Clinical Data: Prednisone Elevates Cholesterol''; Tentatively accepted for publication in Annals of Internal Medicine, currently being revised for publication.}
    25. [Blum;86b]* {R.L. Blum: ``Two-Stage Regression: Application to a Time-Oriented Clinical Database''; Submitted for publication.}
    26. [Brinkley;83] {Brinkley,J.F.: ``Artificial Intelligence and Ultrasonic Imaging: The use of learned Shape Knowledge to Analyze 3D data''; Proc. 28th annual meeting , American Institute of Ultrasound in Medicine, New York, oct. 1983.}
    27. [Ceri;81]* {Stefano Ceri, Shamkant Navathe, and Gio Wiederhold: Optimal Design of Distributed Databases; Stanford CS Report STAN-CS-81-884, Dec.1981.}
    28. [Ceri;83]* {Stefano Ceri, Shamkant B. Navathe, and Gio Wiederhold: ``Distribution Design of Logical Database Schemas''; IEEE Transactions on Software Engineering, vol. SE-9 no.4, July 1983, pp.487-563.}
    29. [Ceri;84a] {Stefano Ceri and Guiseppe Pelagatti: Distributed Database Design; McGraw-Hill, April 1984.}
    30. [Ceri;84b]* {Stefano Ceri, Barbara Pernici, and Gio Wiederhold: ``An Overview of Research in the Design of Distributed Databases''; IEEE Database Engineering Bulletin, vol.7, Dec.1984.}
    31. [Ceri;86]* {Stefano Ceri, Georg Gottlob, and Gio Wiederhold: ``Interfacing Relational Databases and prolog Efficiently''; Proceedings of the First International Conference on Expert Database Systems, April 1986; to be republished by Benjamin Cummins; accepted for IEEE TSE, 1987.}
    32. [Ceri;87]* {Stefano Ceri, Barbara Pernici, and Gio Wie\-derhold: ``Distributed Database Design Methodolgies''; Proceedings of the IEEE, Vol.75 No.5, May 1987, pp.533-546.} <%Chaudhuri;89, temporal>
    33. [Corella;84]* {Francisco Corella, S.J. Kaplan, G. Wiederhold, and L. Yesil: ``Cooperative Responses to Boolean Queries''; Proc. IEEE Data Engineering Conference; April 1984, Los Angeles CA, pp.77-93.}
    34. [Davidson;80]* {J. Davidson and S.J. Kaplan: ``Parsing in the Absence of a Complete Lexicon''; Proceedings of the 18th Annual Meeting of the Association for Computational Linguistics, Philadelphia PA, June 19-22 1980, pp.105-106.}
    35. [Davidson;82] {Jim Davidson: ``Natural Language Access to Databases: User Modeling and Focus''; Proceedings of the CSCSI/SCEIO Conference; 1982, Saskatoon, Saskatchewan, Canada, May 17-19, 1982, pp.204-211.}
    36. [Davidson;83]* {Jim Davidson and S.J. Kaplan: ``Natural Language Access to Databases: Interpreting Update Requests''; American Journal of Computational Linguistics, April-June 1983, pp.57-68.}
    37. [Davidson;84a]* {Jim Davidson: ``A Natural Language Interface for Performing Database Updates''; Proc. IEEE Data Engineering Conference; April 1984, Los Angeles CA.}
    38. [Davidson;84b] {Jim Davidson: Interpreting Natural Language Database Updates; Ph.D. dissertation, Stanford CS Report to appear, 1984.}
    39. [DeMichiel;89a]* {Linda G. DeMichiel: ``Performing Operations over Mismatched Domains''; Proceedings of the Fifth International Conference on Data Engineering, 1989.} (Received the ``Best Paper'' award at the conference.)
    40. [DeMichiel;89b]* {Linda G. DeMichiel: ``Resolving Database Incompatibility: An Approach to Performing Operations over Mismatched Domains''; submitted by invitation to IEEE Transactions on Knowledge and Data Engineering special issue, 1989.}
    41. [deZegher-Geets;87]* {Isabelle M. de Zegher-Geets, Andrew G. Freeman, Michael G. Walker, Robert L. Blum, Gio Wiederhold: ``Computer-aided summarization of a time-oriented medical data base''; Proceedings of the Third Annual International Conference of the Institute for Medical Records Economics, Chicago, April 26-May 1 also KSL report KSL-87-18.}
    42. [deZegher-Geets;88]* {Isabelle M. de Zegher-Geets, Andrew G. Freeman, Michael G. Walker, Robert L. Blum, Gio Wiederhold: ``Summarization and Display of On-line Medical Records''; M.D. Computing, Vol.5 no.3, March 1988, pp.38-46.}
    43. [Downs;86]* {S.M. Downs, M.G. Walker, and R.L. Blum: Automated Summarization of On-line Medical Records; Stanford Memo KSL-86-6, January 1986; Proceedings of Medinfo'86, North-Holland 1986, pp.800-804.}
    44. [ElMasri;79a]* {R. ElMasri and G. Wiederhold: ``Database Model Integration Using the Structural Model''; Proceedings of the the ACM-SIGMOD Conference, Bernstein(ed.), Boston MA, June 1979, pp.191-198.}
    45. [ElMasri;79b]* {R. ElMasri and G. Wiederhold: ``Properties of Relationships and Their Representation''; Proceedings of the 1979 NCC, AFIPS vol.49, Aug.1979, pp.319-326.}
    46. [ElMasri;80]* {Ramez A. ElMasri: On the Design, Use, and Integration of Data Models; Ph.D. dissertation, Stanford CS Report CS-80-801, June 1980.}
    47. [ElMasri;81]* {Ramez ElMasri and Gio Wiederhold: ``Formal Specifications of gordas: A High-Level Query Language for Graph Databases''; Proceedings of the Second International Conference on the Entity-Relationship Approach to System Analysis and Design, Washington DC, Oct.12-14, 1981.}
    48. [Finkelstein;82]* {Sheldon Finkelstein: ``Common Expression Analysis in Database Applications''; ACM-SIGMOD International Conference on Management of Data, Orlando FL, June 2-4 1982, pp.235-245.}
    49. [Friedman;85] {F. Friedman, A. Keller, J. Salasin, D.L. Spooner, and Gio Wiederhold: ``Reference Model for ADA Interfaces to Database Management Systems''; Second IEEE Computer Society Data Engineering Conference, Los Angeles CA, February 1986.}
    50. [Friedrich;88a]* {Gerhard Friedrich and Wolfgang Nejdl: ``Increasing the Information Content of Diagnostic Examples Using a Domain Model''; 9th International Workshop on Expert Systems and their Applications - Specialized Conference on Second Generation Expert Systems, Avignon, France, May 1988.}
    51. [Friedrich;88b]* {Gerhard Friedrich, George Gottlob and Wolfgang Nejdl: ``Generating Efficient Diagnostic Procedures from Model Based Knowledge Using Logic Programming Techniques''; Computers and Mathematics with Applications, Special Issue: Logic Programming in Intelligent Decision and Control Systems.}
    52. [Garcia;77] {Hector Garcia-Molina and Gio Wiederhold: ``Application of the Contract Net Protocol to Distributed Data Bases''; Stanford University Heuristic Programming Project paper HPP-77-21, April 1977.}
    53. [Garcia;78a] {Hector Garcia-Molina: ``Distributed Database Coupling''; Stanford University Heuristic Programming Project paper HPP-78-4, March 1978.}
    54. [Garcia;78b]* {Hector Garcia-Molina: ``Performance Comparison of Update Algorithms for Distributed Databases, Part I''; Technical Note 143, Stanford University, Computer Systems Laboratory, Departments of Electrical Engineering and Computer Science, June 1978.}
    55. [Garcia;78c] {Hector Garcia-Molina: ``Distributed Database Couplings''; Third USA-Japan Conference Proceedings, AFIPS, San Francisco CA, Oct.1978, pp.75-79.}
    56. [Garcia;78d]* {Hector Garcia-Molina: ``Crash Recovery in the Centralized Locking Algorithm''; Technical Note 153, Stanford University, Digital Systems Laboratory, Departments of Electrical Engineering and Computer Science, November 1978.}
    57. [Garcia;78e] {Hector Garcia-Molina: ``Performance Comparison of Two Update Algorithms for Distributed Databases''; Proceedings of the 3rd Berkeley Conference on Distributed Data Management and Computer Networks, August 1978, pp.108-119.}
    58. [Garcia;78f]* {Hector Garcia-Molina: ``Performance Comparison of Update Algorithms for Distributed Databases, Part II''; Technical Note 146, Stanford University, Computer Systems Laboratory, December 1978.}
    59. [Garcia;79a]* {Hector Garcia-Molina: ``Restricted Update Transactions and Read-Only Transactions''; Technical Note 154, Stanford University, Computer Systems Laboratory, January 1979.}
    60. [Garcia;79b]* {Hector Garcia-Molina: ``Partitioned Data, Multiple Controllers, and Transactions with an Initially Unspecified Base Set''; Technical Note 155, Stanford University, Computer Systems Laboratory, February 1979.}
    61. [Garcia;79c] {Hector Garcia-Molina: ``A Concurrency Control Mechanism for Distributed Databases Which Uses Centralized Locking Controllers''; Proceedings of the 4th Berkeley Conference on Distributed Data Management and Computer Networks, August 1979, pp.113-124.}
    62. [Garcia;79d] {Hector Garcia-Molina: ``Centralized Control Update Algorithms for Distributed Databases''; Proceedings of the 1st International Conference on Distributed Processing Systems, October 1979.}
    63. [Garcia;79e]* {Hector Garcia-Molina: Performance of Update Algorithms for Replicated Data in a Distributed Database; Ph.D. dissertation, Stanford University, Computer Science Department report CS-79-744, 1979.}
    64. [Garcia;80] {Hector Garcia-Molina and Gio Wiederhold: ``Read Only Transactions''; Stanford University Computer Science Department report CS-80-797, April 1980.}
    65. [Garcia;81] {Hector Garcia-Molina: Performance of Update Algorithms for Replicated Data in a Distributed Database; (revised) UMI Research Press, Ann Arbor MI, ISBN 0-8357-1219-2, Aug.1981, 320pp.}
    66. [Garcia;82]* {Hector Garcia-Molina and Gio Wiederhold: ``Read-Only Transactions in a Distributed Database''; ACM Transactions on Database Systems, vol.7 no.2, June 1982, pp.209-234.}
    67. [Gardner;81] {Anne Gardner, Terry Winograd, and Jim Davidson: ``Natural Language Understanding''; The Handbook of Artificial Intelligence, vol. I, A. Barr and E. Feigenbaum (eds), 1981; also Report No. STAN-CS-79-754, Computer Science Department, Stanford University, 1979.}
    68. [Gottlob;86a] {George Gottlob: ``Subsumption and Implication''; to appear in Information Processing Letters, 1986.}
    69. [Gottlob;86b]* {George Gottlob: ``On the Size of Nonredundant FD-Covers''; submitted for publication, 1986.}
    70. [Ghosh;79]* {Sakti Ghosh, A.F. Cardenas, I. Mijares, and G. Wiederhold: ``Some Very Large Data Bases in Developing Countries''; 5th International Conference on Very Large Databases, Rio de Janeiro, Brazil, Oct.1979, pp.173-182.}
    71. [Goldberg;83] {A. Goldberg and R. Paige: Stream Processing; {Rutgers University}, Technical Report LCSR-TR-46, Aug.1983.}
    72. [Hall;88]* {Keith Hall: ``An Introduction to the Problems of Engineering Information Systems''; IEEE Systems Design and Network Conference, April 1988, pp.85-89.}
    73. [Hall;89]* {Keith Hall, David Knapp, Ken Smith, Gio Wiederhold, and Marianne Winslett: ``Explicit and Implicit Change Coordination in an Information-Rich Design Environment''; NSF Engineering Design Conference, Amherst, June 1989.}
    74. [Kaplan;79a]* {S.J. Kaplan: ``Cooperative Responses from A Portable Natural Language Data Base Query System''; {Ph.D. dissertation}, University of Pennsylvania, July 1979; also Stanford University Heuristic Programming Project paper HPP-79-19.}
    75. [Kaplan;79b]* {S.J. Kaplan, E. Mays, and A.K. Joshi: ``A Technique for Managing the Lexicon in a Natural Language Interface to a Changing Data Base''; Proceedings of the 5th International Joint Conference on Artificial Intelligence, Tokyo, Japan, August 1979.}
    76. [Kaplan;80]* {S.J. Kaplan: ``Appropriate Responses to Inappropriate Questions''; in Formal Aspects of Language and Discourse, Joshi, Sag and Webber(eds.), Cambridge University Press, 1980.}
    77. [Kaplan;81a]* {S.J. Kaplan and Jim Davidson: ``Interpreting Natural Language Updates''; Proc. 19th Annual meeting, Association for Computational Linguistics, June 1981.}
    78. [Kaplan;81b] {S. Kaplan and D. Ferris: ``Hello, This is Your System Speaking''; Computing, vol.9 no.43, Oct.22, 1981.}
    79. [Kaplan;82a]* {S. Jerrold Kaplan: ``Cooperative Responses from a Portable Natural Language Database Query System''; in Computational Models of Discourse, Brady(ed.), MIT Press, 1982, pp.167-208.}
    80. [Kaplan;82b]* {S. Jerrold Kaplan: ``Cooperative Responses from a Portable Natural Language Query System''; Artificial Intelligence, vol.19, Oct.1982, pp.165-187.}
    81. [Kaplan;84]* {S. Jerrold Kaplan: ``Designing a Portable Natural Language Database Query System''; ACM TODS, vol.9 no.1, Mar.1984, pp.1-19.}
    82. [KBMS;\number\year]* {KBMS project staff: A Precis of Research on Knowledge-based Management Systems; {Stanford University}, Computer Science Dept., KBMS project, \thismonth.}
    83. [Keller;81a]* {Arthur M. Keller and Gio Wiederhold: ``Validation of Updates Against the Structural Database Model''; Symposium on Reliability in Distributed Software and Database Systems, July 1981, Pittsburgh PA, pp.195-199.}
    84. [Keller;81b] {A.M. Keller: Updates to Relational Databases Through Views Involving Joins; {IBM Research Report}, IBM San Jose and Stanford University, October 1981.}
    85. [Keller;82]* {Arthur M. Keller: ``Updates to Relational Databases through Views Involving Joins''; Proc. 2nd International Conference on Databases: Improving Usability and Responsiveness, Jerusalem, Israel, Academic Press, June 1982, pp.363-384.}
    86. [Keller;84a]* {Arthur M. Keller and Marianne Winslett: ``Approaches for Handling Incomplete Information and Nulls in Databases''; IEEE Data Engineering Conference, Los Angeles CA, April 1984.}
    87. [Keller;84b]* {A.M. Keller and Jeffrey D. Ullman: ``On Complementary and Independent Mappings on Databases''; ACM-SIGMOD 1984 Intern. Conf. on Management of Data, June 1984, Boston MA, February 1986.}
    88. [Keller;84c]* {A.M. Keller: ``Databases: The Key to Organizing and Sharing Information''; CIS Newsletter, Stanford Univ., June 1984.}
    89. [Keller;85a]* {A.M. Keller: ``Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections, and Joins''; ACM Principles of Database Systems, March 1985, Portland OR, ACM SIGACT and SIGMOD.}
    90. [Keller;85b]* {A.M. Keller and Marianne Winslett: ``On the Use of an Extended Relational Model to Handle Changing Incomplete Information''; IEEE Transactions on Software Engineering, July 1985.}
    91. [Keller;85c]* {Arthur M. Keller: Updating Relational Databases Through Views; {Ph.D. dissertation}, Stanford University, Computer Science Dept., report CS-85-1040, February 1985.}
    92. [Keller;86a]* {Arthur M. Keller: ``The Role of Semantics in Translating View Updates''; IEEE Computer, January 1986, pp. 63-73.}
    93. [Keller;86b]* {Arthur M. Keller: ``Set-Theoretic Problems of Null Completion in Relational Databases''; Information Processing Letters, April 1986.}
    94. [Keller;86c]* {Arthur M. Keller: ``Indexed File Access for Ada''; submitted for publication.}
    95. [Keller;86d]* {Arthur M. Keller: ``Unifying Database and Programming Language Concepts Using the Object Model''; (extended abstract), Int. Workshop on Object-Oriented Database Systems, IEEE Computer Society, Pacific Grove, CA, September 1986.}
    96. [Keller;86e] {Arthur M. Keller: ``A Reliable and Deadlock-Free Multi-Indexed B$^+$-Tree''; Report TR-86-19, Dept. of Computer Sciences, Univ. of Texas at Austin, July 1986, 17pp.}
    97. [Keller;86f] {Arthur M. Keller: ``Updating Relational Databases Through Union Views''; submitted for publication, June 1986.}
    98. [Keller;86g]* {Arthur M. Keller: ``Choosing a View Update Translator by Dialog at View Definition Time''; 12th VLDB, August 1986.}
    99. [Keller;87a]* {Arthur M. Keller: ``Comments on Bancilhon and Spyratos' `Update Semantics and Relational Views'''; ACM Trans. on Database Systems, vol.12 no.3, September 1987, pp. 521-523.}
    100. [Keller;87b]* {Arthur M. Keller and Laurel Harvey: ``A Prototype View Update Translation Facility''; Report TR-87-45, Dept. of Computer Sciences, Univ. of Texas at Austin, December 1987, 11pp.}
    101. [Keller;88a]* {Arthur M. Keller and Gio Wiederhold: ``Concurrent Use of B-trees with Variable-Length Entries''; SIGMOD Record, Vol. 17, No. 2, June 1988, pp. 89-90.}
    102. [Keller;88b]* {Arthur M. Keller and Gio Wiederhold: ``Modularization of the DADAISM Ada Database System Architecture''; July 1988}
    103. [King;79]* {Jonathan King: Exploring the Use of Domain Knowledge for Query Processing Efficiency; {Stanford University Heuristic Programming Project paper HPP-79-30}, Dec. 1979.}
    104. [King;80a]* {J. King: ``Modelling Concepts for Reasoning about Access to Knowledge''; Proceedings of the ACM Workshop on Data Abstraction, Data Bases, and Conceptual Modelling, Pingree Park CO, June 23-26, 1980, ACM-SIGPLAN Notices, vol.16 no.1, Jan.1981.}
    105. [King;80b] {J. King: ``Intelligent Retrieval Planning''; Proc. of the first National Conference on Artificial Intelligence, Stanford CA, August 1980, pp.243-245.}
    106. [King;81a]* {Jonathan J. King: Query Optimization by Semantic Reasoning; {PhD dissertation}, Technical Report STAN-CS-81-857, Stanford University Computer Science Dept., May 1981.}
    107. [King;81b]* {Jonathan J. King: ``quist: A System for Semantic Query Optimization in Relational Databases''; VLDB 7, Zaniolo and Delobel(eds), Sep.1981, pp.510-517.}
    108. [King;84] {Jonathan J. King: Query Optimization by Semantic Reasoning; Univ.of Michigan Press, 1984.}
    109. [Konolige;86a]* {Kurt Konolige: ``Resolution and Quantified Epistemic Logics''; 8th Conference on Automatic Deduction; Oxford, UK, 1986}
    110. [Konolige;86b]* {Kurt Konolige: A Deduction Model of Belief; Pitman Advanced Research Notes in Artificial Intelligence, 1986}
    111. [Konolige;87]* {Kurt Konolige: ``On the Relation Between Default Theories and Autoepistemic Logic''; submitted to IJCAI-87}
    112. [Koo;87]* {Charles C. Koo: ``Synchronizing Plans among Intelligent Agents via Communication''; AAAI-87, July 1987.}
    113. [Koo;88]* {Charles C. Koo, Gio Wiederhold, and Paul Cashman: ``A Committment-based Communication Model for Distributed Office Environments''; Proc. ACM Conference on Office Systems, Palo Alto, March 1988, ACM Press, pp.291-298.}
    114. [Kuhn;81]* {I.M. Kuhn and G. Wiederhold: ``The Evolution of Ambulatory Medical Record Systems in the U.S.''; SCAMC 5, IEEE, 1981, pp.80-85.}
    115. [Kuhn;82]* {Ingeborg M. Kuhn, Gio Wiederhold, Jonathan E. Rodnick, Diane Ramsey-Klee, Stanford Benett, Donald D. Beck: Automated Ambulatory Medical Record Systems in the U.S.; {Stanford CS Report STAN-CS-82-298}, Stanford Univ., August 1982. Re-published as Chapter 14 in Bruce I. Blum (ed.): Information Systems for Patient Care; Springer Verlag 1984, pp.199-217.}
    116. [Law;89] {Kincho Law and Thierry Barsalou: ``Applying a semantic structural model for engineering design''; to be published in Proceedings of the Computers in Engineering Conference, American Society of Mechanical Engineers, Anaheim, CA, August 1989.}
    117. [Lee;89]* {B.S. Lee and Gio Wiederhold: ``Interfacing Objects and Relational Databases with Views: the System Design Aspect''; Computer Science Dept., Stanford University, Stanford, Jan.1989}.
    118. [Malachi;85] {Y. Malachi, Z. Manna, and R. Waldinger: ``TABLOG: Functional and Relational Programming in One Framework''; IEEE Software, Jan.1986, pp.75-76.}
    119. [Manna;86a]* {Zohar Manna and Richard Waldinger: ``Special Relations in Automated Deduction''; JACM, January 1986}
    120. [Manna;86b]* {Zohar Manna and Richard Waldinger: ``The Origin of a Binary-Search Paradigm''; IJCAI-85, also {SRI Technical Note 351R, October, 1986} to appear in Science of Computer Programming}
    121. [Manna;86c]* {Zohar Manna and Richard Waldinger: ``How to Clear a Block: A Theory of Plans''; {SRI Technical Note 397, December, 1986}}
    122. [Milton;83]* {J. Milton and G. Wiederhold: ``Network and Relational Systems for VLSI Design: Contradictory Performance?''; Database Engineering, Volume I, Kim, Batory, Hevner, Katz, and Reiner(eds.), IEEE Computer Society Press, 1983, pp.167-170.}
    123. [Minoura;78] {T. Minoura: ``Maximally Concurrent Transaction Processing''; Berkeley Workshop 3, 1978.}
    124. [Minoura;81]* {Toshimi Minoura and Gio Wiederhold: ``Resilient Extended True-Copy Token Scheme for a Distributed Database System''; IEEE Symposium on Reliability in Distributed Software and Database Systems, July 1981, Pittsburgh PA, pp.1-12.}
    125. [Minoura;82]* {Toshimi Minoura and Gio Wiederhold: ``Resilient Extended True-Copy Token Scheme for a Distributed Database System''; IEEE Transactions on Software Engineering, vol.SE-8 no.3, May 1982, pp.173-188.}
    126. [Minoura;83a] {Toshimi Minoura, Susan Owicki, and Gio Wiederhold: True-Copy Token Scheme for a Distributed Database System; Oregon State University Report 83-60-1; in Concurrency Control and Reliability in Distributed Systems, Bhargava(ed), VanNorstrand-Reinhold, 1987, pp.426-453.}
    127. [Minoura;83b] {Toshimi Minoura, Susan Owicki, and Gio Wiederhold: Consistent Distributed Database State Maintenance; Oregon State University CS Report 83-60-2.}
    128. [Missikoff;84] {Michele Missikoff and Gio Wiederhold: ``Towards a Unified Approach for Expert and Database Systems''; Proceedings of the First Workshop on Expert Database Systems, Kiawah Island, South Carolina, Oct.1984, Institute of Information Management, Technology and Policy, Univ. of South Carolina, vol.2, pp.186-206; also in Expert Database Systems, Larry Kerschberg (editor), Benjamin/Cummings, 1986, pages 383-399.}
    129. [Morgenstern;89]* {Mathew Morgenstern, Alex Borgida, Catherine Lassez, David Maier, and Gio Wiederhold: ``Constraint-based Systems: Knowledge about Data''; in Expert Database Systems, L. Kerschberg (editor), Benjamin Cummins 1989, pp.23-43.}
    130. [Navathe;82]* {Sham Navathe, Stefano Ceri, JinLi Dou, and Gio Wiederhold: ``Vertical Partitioning for Physical and Distribution Design of Databases''; {Stanford Computer Science report STAN-CS-82-957}, Dec.1982; ACM TODS, vol.9 no.4, Dec.1984, pp.680-710.}
    131. [Olumi;83]* {Mohamed Olumi, G. Wiederhold, C. Hauser, P. Lucas, and J. Mehl: ``Software Project Databases''; ACM-SIGMOD Proceedings, May 1983, pp.124-134.}
    132. [Paige;82] {Robert Paige: ``Applications of Finite Differencing to Database Integrity Control and Query-Transaction Optimization''; in Bases Logiques pour Bases de Donnees, Nicolas(Ed.), Onera-Cert, Tou\-louse, 1982.}
    133. [Paige;83a] {Robert Paige: ``Transformational Pro\-gramming -- Applications to Algorithms and Systems''; ACM-POPL '83, 1983.}
    134. [Paige;83b]* {Robert Paige: ``Applications of Finite Differencing to Database Integrity Control and Query\slash Trans\-action Optimization''; Advances in Database Theory, vol.2, Minker, Nicolas, and Gallaire (eds.), 1983.}
    135. [Pednault;84] {Edwin Pednault: ``A New Approach to Classical Planning''; Artificial Intelligence Center, SRI International, submitted for publication, April 1984.}
    136. [Qian;85a] {XiaoLei Qian: ``FLASH Users Guide''; Second Version, Stanford University, CSD, KBMS project, April 1985.}
    137. [Qian;85b]* {XiaoLei Qian and Gio Wiederhold: ``Data Definition Facility of critias''; Proceedings of the 4th International Conference on Entity-Relationship Approach, Oct.1985, pp.46-55.}
    138. [Qian;86]* {XiaoLei Qian and Gio Wiederhold: ``Knowledge-based Integrity Constraint Validation''; Proceedings of the 12th International Conference on Very Large Data Bases, Aug.1986, pp.3-12.}
    139. [Qian;87]* {XiaoLei Qian and Douglas Smith: ``Constraint Reformulation for Efficient Validation''; Proceedings of the 13th International Conference on Very Large Data Bases, Sept.1987, pp.417-425.}
    140. [Qian;88a]* {XiaoLei Qian: ``An Effective Method for Integrity Constraint Simplification''; Proceedings of the 4th International Conference on Data Engineering, Feb.1988, pp.338-345.}
    141. [Qian;88b]* {XiaoLei Qian: ``Distribution Design of Integrity Constraints''; Proceedings of the 2nd International Conference on Expert Database Systems, Apr.1988, pp.75-84; also in Expert Database Systems: Proceedings from the Second International Conference, Larry Kerschberg (editor), Benjamin/Cummings, 1989, pp.205-226.}
    142. [Qian;88c]* {XiaoLei Qian and Richard Waldinger: ``A Transaction Logic for Database Specification''; Proceedings of the ACM-SIGMOD International Conference on Management of Data, Chicago, June 1988, pp.243-250.}
    143. [Qian;89]* {Xiaolei Qian: ``On the Expressive Power of the Bounded-Iteration Construct''; Proceedings of the Second International Workshop on Database Programming Languages, Gleneden Beech, Oregon, June 1989, pp.403-413.}
    144. [Rathmann;84a]* {Peter Rathmann: ``Dynamic Data Structures on Optical Disks''; IEEE Data Engineering Conf., Los Angeles CA, Apr.24-27, 1984.}
    145. [Rathman;84b] {Peter Rathmann: ``A Tool for Optical Disk Data Structure Investigation''; {Stanford University}, Computer Science Dept., KBMS project, Spring 1984.}
    146. [Rathmann;86]* {Peter Rathmann: ``The Cost of Locking''; International Conference on Database Theory, Rome, 1986. Also published in ICDT '86, Lecture Notes in Computer Science, no.243, Auseiello and Atzeni, (eds.), Springer-Verlag, 1987.}
    147. [Rowe;79a] {Neil Rowe: ``Network Support for a Distributed Data Base System''; Berkeley Workshop 4, August 1979.}
    148. [Rowe;81a] {Neil Rowe: ``Rule-Based Statistical Calculations on a Database Abstract''; 1st LBL Workshop on Statistical Database Management, Menlo Park CA, December 1981, pp.163-176.}
    149. [Rowe;82a]* {Neil Rowe: ``Diophantine Compromise of a Statistical Database''; Chap. 3, Stanford Computer Science Dept. Report CS-82-948, 1982 and Information Processing Letters, June 1983.}
    150. [Rowe;82b] {Neil Rowe: ``Inheritance of Statistical Properties''; AAAI-82 Conference, Pittsburgh PA, August 18-20, 1982, pp.221-224.}
    151. [Rowe;82c] {Neil Rowe: ``On Some Arguable Claims in B. Shneiderman's Evaluation of Natural Language Interfaces to Databases''; ACM-SIGMOD Record, vol.13 no.1, Sep.1982.}
    152. [Rowe;82d]* {Neil Rowe: Modelling Degrees of Item Interest for a General Database Query System; {Stanford Report} STAN-CS-82-947, October 1982, 37 pp.}
    153. [Rowe;82e] {Neil Rowe: ``Inheritance of Statistical Properties''; National Conference, American Association for Artificial Intelligence, Pittsburgh PA, 1982, pp.221-314. Also part 2 of Stanford CS Report CS-82-948, December 1982.}
    154. [Rowe;82f] {Neil Rowe: Three Papers on Rule-based Estimation of Statistics on Databases; {Stanford Computer Science Report} CS-82-948, Dec.1982.}
    155. [Rowe;83a] {Neil Rowe: ``Top-down Estimation of Statistics on a Database''; Proceedings of the ACM-SIGMOD Conference, May 1983, pp.135-145.}
    156. [Rowe;83b] {Neil Rowe: ``An Expert System for Statistical Estimates on Databases''; National Conference of the American Association for Artificial Intelligence, March 1983.}
    157. [Rowe;83c] {Neil Rowe: ``Some Experiments in Evaluation of an Expert System for Statistical Estimation on Databases''; 2nd Workshop on Statistical Database Management, Lawrence Berkeley Lab technical report, September 1983.}
    158. [Rowe;83d]* {Neil Rowe: Rule-Based Statistical Calculations on a Database Abstract; {Ph.D. dissertation}, Stanford University Computer Science Dept. report STAN-CS-83-975, June 1983.}
    159. [Rowe;84a] {Neil Rowe: ``Diophantine Inferences from Statistical Aggregates on Few-Valued Attributes''; IEEE Data Engineering Conference, Los Angeles CA, April 1984.}
    160. [Rowe;84b] {Neil Rowe: ``Diophantine Inference on a Statistical Database''; Information Processing Letters, 18 (1984) 25-31.}
    161. [Rowe;85] {Neil C. Rowe: ``Antisampling for Estimation: an Overview''; IEEE TSE, Vol.SE-11 No.10, Oct.1985, pp.1081-1091.}
    162. [Rowe;88] {Neil C. Rowe: ``Absolute Bounds on the Mean and Standard Deviation of Transformed Data for Constant-Sign-Derivative Transformations''; SIAM J.Sci.Stat.Comput., Vol.9 No.6, Nov.1988.}
    163. [Sacca;83]* {Domenico Sacc\`a and Gio Wiederhold: ``Partitioning in a Cluster of Processors''; ACM TODS, vol.10 no.1, Mar. 1985, pp.29-56.; VLDB 9, Florence Italy, October 31-November 2, 1983, extended abstract only; also IBM Report RJ4076, full paper, 1983.}
    164. [Shaw;79] {David E. Shaw: A Hierarchical Associative Architecture for the Parallel Evaluation of Relational Algebraic Database Primitives; {Stanford Computer Science Department Report} CS-79-778, October 1979.}
    165. [Shaw;80a]* {D. Shaw: ``A Relational Database Machine Architecture''; Proceedings of the 1980 Workshop on Computer Architecture for Non-Numeric Processing, Asilomar CA, Mar.1980, also {\proc ACM-SIGMOD}, vol.X no.4, and {\proc ACM-SIGIR}, vol.XV no.2, Apr.1980, pp.84-95.}
    166. [Shaw;80b] {D. Shaw: Knowledge-Based Retrieval on a Relational Databased Machine; {PhD dissertation}, Stanford University, Computer Science Department report CS-80-823, Sep.1980.}
    167. [Shaw;81]* {D.E. Shaw, S.J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold and J.A. Andrews: ``The non-von Database Machine: A Brief Overview''; IEEE Database Engineering Bulletin, vol.4 no.2, Dec.1981.}
    168. [Shuey;86]* {Richard Shuey and Gio Wiederhold: ``Data Engineering and Information Systems''; IEEE Computer Magazine, vol.19 no.1, January 1986, pp.18-30.}
    169. [SongYTH;85]* {ZaiSheng Song, JuanFen Yu, ShiWei Tang, and ChoChun Hsu: ``Optical Disk flash''; {Stanford University}, Computer Science Dept., KBMS project, September 1985.}
    170. [Spooner;85]* {David Spooner, Arthur.M. Keller, Gio Wiederhold, John Salasin, and Deborah Heystek: Framework for the Security Component of an Ada DBMS; {Tech. Report}, IDA, December 1985, VLDB 12, Kambayashi(ed), Kyoto Japan, Aug.1986, pp.347-354. Also Proc. National Computer Security Center Invitational Workshop on Database Security, Baltimore, June 1986.}
    171. [Swami;88]* {Arun Swami and Anoop Gupta: ``Optimization of Large Join Queries''; Proceedings of ACM-SIGMOD International Conference on Management of Data, 1988.}
    172. [Swami;89]* {Arun Swami: ``A Validated Cost Model for Main Memory Databases''; to appear in ACM SIGMETRICS, 1989.}
    173. [Tran-Dinh-Le;85a]* {Diep Tranh-Dinh-le: ``Report on CRITIAS implementation''; Stanford University, Computer Science Department, KBMS project, March 1985.}
    174. [Tran-Dinh-Le;85b]* {Diep Tranh-Dinh-le: ``CRITIAS Language Reference Manual''; Stanford University, Computer Science Department, KBMS project, March 1985.}
    175. [Utsumi;87]* {Hiroo Utsumi: ``Update Management System for Remote LISP RDB Queries''; Stanford Univ., CSD, KBMS CS393 report, Aug.1987.}
    176. [WalkerBF;85]* {M.G. Walker, R.L. Blum, and L.M. Fagan : ``Minimycin: A Miniature Rule-Based System''; M.D.Computing, vol.2 no.4., p.21, 1985.}
    177. [Walker;85]* {M.G. Walker and R.L. Blum: ``An Introduction to LISP'';M.D.Computing, vol.2 no.1., p.56, 1985.}
    178. [Walker;86a]* {M.G. Walker and R.L. Blum: Towards Automated Discovery from Clinical Databases: the RADIX Project; {Stanford KSL Memo} KSL-86-7, January 1986; IFIPS Medinfo '86.}
    179. [Walker;87]* {M.G. Walker: ``How Feasible is Automated Discovery?''; IEEE Expert, Spring 1987, pp.69-82.}
    180. [Whang;81a]* {K. Whang, G. Wiederhold, and D. Sagalowicz: ``Separability: An Approach to Physical Database Design''; in Proceedings of the Seventh International Conference on Very Large Data Bases, Zaniolo and Delobel(eds.), Cannes, France, Sep.1981, pp.320-332.}
    181. [Whang;81b]* {K. Whang, G. Wiederhold, and D.Sagalowicz: Separability as a Physical Database Design Methodology; {Stanford Report} CSL TR-222 and STAN-CS-81-898, Stanford University, October, 1981.}
    182. [Whang;82]* {K.-Y. Whang, G. Wiederhold, and D. Sagalowicz: ``Physical Design of Network Model Databases Using the Property of Separability''; in Eighth International Conference on Very Large Data Bases, McLeod and Villasenor (eds.), Mexico City, Sep.1982, pp.98-107.}
    183. [Whang;83a] {K.-Y. Whang: ``Physical Design Algorithms for Multiple Relational Databases''; submitted for publication, 1983.}
    184. [Whang;83b]* {K.-Y. Whang, G. Wiederhold, and D. Sagalowicz: ``Estimating Block Accesses in Database Organi\-zations, A Closed, Non-iterative Formula''; Communications of the ACM, vol.26 no.11, November 1983, pp.940-944.}
    185. [Whang;84]* {K.-Y. Whang, G. Wiederhold, and D. Sagalowicz: ``Separability-An Approach to Physical Database Design''; IEEE Transactions on Computers, vol.c-33 no.3, March 1984, pp.209-222.}
    186. [Whang;85] {K.-Y. Whang, G. Wiederhold, and D. Sagalowicz: ``The Property of Separability and Its Application to Physical Database Design''; Query Processing in Database Systems, Kim, Batory, Reiner (eds.), Springer Verlag, 1985.}
    187. [Wiederhold;78a] {Gio Wiederhold: ``Introducing Semantic Information into a Database Schema''; Proceedings of the CIPS Session '78, Canadian Information Processing Society, September, 1978, pp.338-391.}
    188. [Wiederhold;78b]* {Gio Wiederhold: ``Management of Semantic Information for Databases''; {Stanford CSD HPP report},HPP-78-12, Proceedings of the 3rd USA-Japan Conference, Session 10-2-1, San Francisco, October 1978, pp.192-197.}
    189. [Wiederhold;79a]* {Gio Wiederhold and Ramez ElMasri: Structural Model for Database Systems; {Stanford University, Computer Science Department report} CS-79-722, April 1979.}
    190. [Wiederhold;79b] {G. Wiederhold and R. ElMasri: ``Errata for Database Design''; Database Engineering Bulletin, IEEE Technical Committee on Databases, 1979.}
    191. [Wiederhold;79c]* {Gio Wiederhold and Ramez ElMasri: ``The Structural Model for Database Design''; Proceedings of the International Conference on Entity-Relationship Approach to Systems Analysis and Design, North Holland, December 1979, pp.247-267.}
    192. [Wiederhold;79d]* {Gio Wiederhold: ``Databases for Economic Planning in India''; Management Sciences and the Development of Asian Economies, S. Torok (editor), Times Books International, Singapore, 1979.}
    193. [Wiederhold;80a] {Gio Wiederhold: Databases in Health Care; {Stanford Computer Science Dept. Report} CS-80-790, March 1980.}
    194. [Wiederhold;80b]* {Gio Wiederhold, Anne Beetem, and Garrett Short: ``A Database Approach to Communication in VLSI Design''; {Technical report} CSL-196, Stanford University, Computer Systems Laboratory, 1980.}
    195. [Wiederhold;80c] {Gio Wiederhold: ``New Technologies''; in The Computer in the Doctor's Office, Rienhoff and Abrams (eds.), North-Holland Publishing Company, IFIP, 1980, pp.263-264.}
    196. [Wiederhold;80d] {Gio Wiederhold (Translator R. Gritsch): Datenbanken, Analyse--Design--Er\-fahr\-un\-gen; Band 1 Dateisysteme, R. Oldenbourg Verlag M\"unchen 1980, Reihe Datenverarbeitung, 454 pp.}
    197. [Wiederhold;81a] {Gio Wiederhold: Databases for Health Care; {Lecture Notes in Medical Informatics no.12, Lindberg and Reichertz(eds.)}, Springer-Verlag, Berlin, Heidelberg, New York, 1981, 75 pp.}
    198. [Wiederhold;81b]* {Gio Wiederhold: Binding in Information Processing; {Stanford University Computer Science Report} STAN-CS-81-851, May 1981.}
    199. [Wiederhold;81c]* {Gio Wiederhold: ``Database Technology in Healthcare''; Journal of Medical Systems, Plenum, vol.5 no.3, Sept.1981, pp.175-196.}
    200. [Wiederhold;81d] {G. Wiederhold, J. Kaplan, and D. Sagalowicz: ``Research in Knowledge Base Management Systems''; ACM-SIGMOD Record, vol.11 no.3, Apr.1981, pp.26-54.}
    201. [Wiederhold;82a] {Gio Wiederhold, J. Kaplan, and D. Sagalowicz: ``Physical Database Design Research at Stanford''; IEEE Database Engineering Bulletin, vol.5 no.1, pp.39-41, March 1982, republished in Database Engineering, IEEE Computer Society, 1983.}
    202. [Wiederhold;82b]* {Gio Wiederhold, Anne Beetem, and Garret Short: ``A Database Approach to Communication in VLSI Design''; IEEE Transactions on Design Automation, vol.CAD-1 no.2, April 1982, pp.57-63; to be republished in Digital VLSI Systems, M.I. Elmasry (ed.), IEEE Press, 1985.}
    203. [Wiederhold;82c]* {Gio Wiederhold: ``Databases for Ambulatory Care''; Proceedings of the Annual Conference of the American Medical Informatics Association, May 1982, pp.79-85.}
    204. [Wiederhold;82d]* {Gio Wiederhold: ``A Method for the Design of Multi-Objective Databases''; Proceedings of the Annual Meeting of the American Society of Mechanical Engineers, June 1982, pp.161-165.}
    205. [Wiederhold;82e] {G. Wiederhold(ed.): Second Symposium on Reliability in Distributed Software and Database Systems; IEEE Pub.no.82CH1792-1, 1982.}
    206. [Wiederhold;82f] {Gio Wiederhold: ``Applications of Computers in Medicine''; Encyclopedia of Computer Science, Ralston (ed.), Second edition; Van Nostrand-Reinhold 1982, pp.686-688, 934-940.}
    207. [Wiederhold;82g] {Gio Wiederhold: ``Scientific Computing: A Scientific Group of Mixed Background Attempts to Cope with a Broad Spectrum of Problems''; abstracted in Roles of Industry and the University in Computer Research and Development, National Academy Press, Washington DC, 1982, pp.60-66.}
    208. [Wiederhold;82h] {Gio Wiederhold: ``Hospital Information Systems''; Encyclopedia of Computer Science and Engineering, 2nd edition, Ralston and Reilly (eds.), Van Nostrand Reinhold Company, 1982.}
    209. [Wiederhold;82i] {Gio Wiederhold: ``Medical Applications''; Encyclopedia of Computer Science and Engineering, 2nd edition, Ralston and Reilly (eds.), Van Nostrand Reinhold Company, 1982.}
    210. [Wiederhold;82j] {Gio Wiederhold: ``Databases and File-Systems''; in The McGraw-Hill Handbook, New York, Helms (ed.), 1982, pp.19-1-99.}
    211. [Wiederhold;83a] {Gio Wiederhold: Database Design; McGraw-Hill (in the Computer Science Series) Second edition, Jan.1983, 768 pp.}
    212. [Wiederhold;83b]* {Gio Wiederhold: ``Modeling Databases''; Information Systems, vol.29, 1983.} <% this paper>
    213. [Wiederhold;83c] { G. Wiederhold, J. Milton, and D. Sagalowicz: The Knowledge-Based Management Systems project; Stanford University, June 1983.}
    214. [Wiederhold;83d] {Gio Wiederhold: ``Networking of Data and Information''; NCI Workshop on Role of Computers in Cancer Clinical Trials, NIH, June 1983.}
    215. [Wiederhold;83e] {Gio Wiederhold: ``Databases and Information Management''; WIS Implementation Study Report, Volume III - Background Information; Thomas H. Probert (ed.), IDA, Oct.1983, pp.75-137, 154.}
    216. [Wiederhold;83f]* {Gio Wiederhold, Jack Milton, and Daniel Sagalowicz: ``Applications of Artificial Intelligence in the Knowledge Based Management Systems Project''; IEEE Database Engineering Bulletin, vol.6 no.4, Dec.1983, pp.75-82; reprinted in IEEE Database Engineering, vol.2, 1984 Kim, Ries, Lochovsky (eds.), pp.257-264.}
    217. [Wiederhold;84a]* {Gio Wiederhold: ``Knowledge and Database Management''; IEEE Software Premier Issue, vol.1 no.1, January 1984, pp.63-73.}
    218. [Wiederhold;84b]* {Gio Wiederhold: ``Databases''; IEEE Computer, Centennial Issue, vol.17 no.10, October 1984, pp.211-223.}
    219. [Wiederhold;84c] {Gio Wiederhold: ``Databases for Statistics''; in Proceedings of the Data Base Management Conference, Naval Post-Graduate School, Monterey CA, November 1984.}
    220. [Wiederhold;84d] {Gio Wiederhold: ``A Metaphor for Distributed Databases''; Tech.report to VisiCorp, March 1984.}
    221. [Wiederhold;85a]* {Gio Wiederhold, Robert L. Blum, and Michael Walker: ``An Integration of Knowledge and Data Representation''; Proceedings of the Islamorada Workship, Feb.1985, Computer Corporation of America, Cambridge MA; in On Knowledge Base Management Systems: Integrating Artificial Intelligence and Database Technologies (Brodie, Mylopoulos, and Schmidt, eds.) Springer-Verlag, June 1986.}
    222. [Wiederhold;85b]* {Gio Wiederhold and Paul D. Clayton: ``Processing Biological Data in Real Time''; M.D. Computing, Springer Verlag, Vol.2 No.6, November 1985, pages 16-25.}
    223. [Wiederhold;85c]* {Gio Wiederhold: ``Knowledge Bases''; Future Generations Computer Systems, North-Holland, vol.1 no.4, May 1985, pp.223-235.}
    224. [Wiederhold;85d] {Gio Wiederhold, John Smith, and Phillip Kaufman: ``Models for Engineering Information Systems''; abstract in Proceedings of the 1985 VHSIC Conference, Silver Spring, Maryland, December 1985.}
    225. [Wiederhold;86a]* {Gio Wiederhold: ``Knowledge Versus Data''; Chapter 9 of On Knowledge Base Management Systems: Integrating Artificial Intelligence and Database Technologies (Brodie, Mylopoulos, and Schmidt, eds.) Springer-Verlag, Jun.1986.}
    226. [Wiederhold;86b]* {Gio Wiederhold, Arthur M. Keller, Sham Navathe, David Spooner, Murray Berkowitz, Bill Brykczynski, and John Salasin: ``Modularization of an Ada Database System''; Stanford, February 1986; Proc.Internat.Pre-VLDB Symposium, Beijing PRC, S.B.Yao,S-H.Sa,Z-S.Zhong(eds), China Assoc.for Science and Technology, Aug.1986, pp.202-232; extended abstract in Proc.6th Adv.Database Symposium, Tokio, Aug.1986, IPSJ, abstract, pp.135-142.}
    227. [Wiederhold;86c]* {Gio Wiederhold: ``Views, Objects, and Databases''; Stanford, March 1986, Proc. of the Workshop on Integrated Manufacturing Systems, Naval Post-graduate School, Monterey CA, 14 Apr.1986; Workshop on Object-Oriented Databases, IEEE and ACM, Sep.1986; IEEE Computer Magazine, Vol.19 no.12, December 1986, pages 37-44; reprinted in Diettrich, Dayal, Buchmann (eds.) Object oriented-Systems, Springer Verlag, 1988.}
    228. [Wiederhold;86d]* {Gio Wiederhold: ``Standardization and Artificial Intelligence''; ASTM Standardization News, vol.14 no.4, Apr.1986, pp.10-11.}
    229. [Wiederhold;86e]* {Wiederhold, Gio: ``Knowledge and Database Design''; 11th Annual DSSD User's Conf., Ken Orr and Associates, Kansas City, Oct. 1986.}
    230. [Wiederhold;86f]* {G.Wiederhold, M.G.Walker, R.L.Blum, and S.Downs: ``Acquisition of Knowledge from Data''; Proc. Int.Symp.on Methodologies for Intelligent Systems, Oct.1986, Knoxville TN.}
    231. [Wiederhold;86g]* {Gio Wiederhold: ``The Knowledge-based Management Systems Project and Time-Oriented Problems''; ACM SIGMOD Record, Dec.1986.}
    232. [Wiederhold;86h]* {Gio Wiederhold (chairman), Murray Berkowitz, Deborah Heystek, Arthur M. Keller, Shamkant Navathe, David Spooner, and John Salasin: ``Ada Foundation Technology, Volume V: Software Requirements for WIS Database Management Prototypes''; Institute for Defence Analyses, Paper P-1893, December 1986.}
    233. [Wiederhold;87a]* {Gio Wiederhold and XiaoLei Qian: ``Modeling Asynchrony in Distributed Databases (Invited Paper)''; 3rd International IEEE Conference on Data Engineering, Feb.1987.}
    234. [Wiederhold;87b] {Gio Wiederhold: File Organization for Database Design; McGraw-Hill Book Company, New York, NY, March 1987.}
    235. [Wiederhold;87c]* {Wiederhold, Gio CM, Michael G. Walker, Waqar Hasan, Surajit Chaudhuri, Arun Swami, Sang K. Cha, XiaoLei Qian, Marianne Winslett, Linda DeMichiel, and Peter K. Rathmann: ``KSYS: An Architecture for Integrating Databases and Knowledge Bases''; Computer Science Department, Stanford University, May 1987; in Amar Gupta and Stuart Madnick (editors) Technical Opinions Regarding Knowledge-Based Integrated Information Systems Engineering, MIT, 1987, pp.61-87.}
    236. [Wiederhold;87d]* {Wiederhold, Gio: ``Chairman's statement: Panel on Experience with Knowledge Bases''; Proceedings ACM SIGMOD Conference, May 1987.}
    237. [Wiederhold;87e]* {Gio Wiederhold, Surajit Chaudhuri, Waqar Hasan, Michael G. Walker, and Marianne Winslett: ``Architectural Concepts for Large Knowledge Bases''; German Workshop on AI, Fachbericht 172, Springer Verlag, Sep.1987, pp.377-396.}
    238. [Wiederhold;87f]* {Gio CM Wiederhold, Michael G. Walker, Robert L. Blum, Stephen M. Downs, and Isabelle deZegher-Geets: ``Acquisition of Knowledge from Medical Records''; abstract in Namen, Daten Themen, Medizin I, Systems 87 conference, Munich, FRG, Oct.1987, pp.213-214.}
    239. [Wiederhold;88a]* {Wiederhold, Gio: ``Engineering Information Systems: Prospects and Problems of Integration''; Digest of Papers IEEE Spring COMPCON March 1988, pages 228-229.}
    240. [Wiederhold;88b]* {Wiederhold, Gio: ``Hospital Information Systems''; in Encyclopedia of Medical Devices and Instrumentation, vol.3, Webster,John G.(ed), Wiley 1988, pp.1517-1542.}
    241. [Wiederhold;88c] {Gio Wiederhold: editorial: Databases; IEEE Expert, vol.3 no.2, summer 1988, page 18.}
    242. [Wiederhold;88d] {Gio Wiederhold and Arthur M. Keller: ``Notes on Databases on Parallel Computers''; April 1988.}
    243. [Wiederhold;88e] {Gio Wiederhold, Arthur M. Keller, Stefano Ceri, and Witold Litwin: FAUVE, Federated Autonomous Updatable Databases, Algorithm Validation and Evaluation; Stanford working paper, August 1988.}
    244. [Wiederhold;88f] {Gio Wiederhold: ``Sharing and Partitioning Large Knowledge Bases''; abstract in H.-J. Appelrath, A.B. Cremers, and H. Schiltknecht (eds) PROTOS, PROLOG Tools for Building Expert Systems, EUREKA report EU56, Basel, Switzerland, Dec. 1988, pp.77-83.}
    245. [Wiederhold;89a]* {Gio Wiederhold, Jim Brinkley, Ramin Samadani, and C. Robert Clauer: ``Model-Driven Image Analysis to Augment Databases''; Visual Databases, T. Kunii (editor), Springer Verlag, 1989, pp.159-180.}
    246. [Wiederhold;89b]* {Gio Wiederhold, Thierry Barsalou, and Surajit Chaudhuri: ``Managing Objects in a Relational Framework''; Technical report No. STAN-CS-89-1245, Computer Science Department, Stanford University, January 1989.}
    247. [Wiederhold;89c]* {Gio Wiederhold et al: ``Future Data Management and Access''; Chapter 4 in Recommendations for the National Scientific Effort on AIDS Modeling and Epidemiology, Cohen and Layne (editors), White House Domestic Policy Council, 1989.}
    248. [Wiederhold;89d]* {Gio Wiederhold, Marianne Winslett, and Lt Nicholas Naclerio: ``Layering an Engineering Information System''; Digest of Papers, IEEE CS Spring COMPCON 34, February-March 1989, pp.444-449.}
    249. [Wiederhold;89e]* {Gio Wiederhold: ``The Architecture of Future Information Systems''; abstract only, in Proceedings of the International Symposium on Database Systems for Advanced Applications, KISS and IPSJ, Apr.1989, Seoul, Korea, pp.vii.}
    250. [WiederholdV;84] {Voy Wiederhold, Lei-Hou Chee, Ja\-mes Theodore, Edward Stinson, and Gio Wiederhold: ``A Comparison of Two Database Systems for Heart-Lung Transplant Data''; Computers in Cardiology, IEEE, September 18-21, 1984, Salt Lake City UT.}
    251. [Winslett;84]* {Marianne Winslett and Gio Wiederhold: ``Relational and Entity-Relationship Model Databases in VLSI Design''; IEEE Database Engineering Bulletin, vol.7 no.2, Jun.1984, pp.61-66.}
    252. [Winslett;85]* {Marianne Winslett, Richard Berlin, Thomas Payne, and Gio Wiederhold: ``Relational and Entity-Relationship Model Databases and Specialized Design Files in VLSI Design''; Proceedings of the ACM Design Automation Conference, Jun.1985, pp.410-416.}
    253. [Winslett;86a] {Marianne Winslett: ``A Model-Theoretic Approach to Updating Logical Databases''; Stanford Computer Science Dept. Technical Report, Jan. 1986.}
    254. [Winslett;86b]* {Marianne Winslett: ``A Model-Theoretic Approach to Updating Logical Databases (Extended Abstract)''; ACM Conference on Principles of Database Systems, Cambridge, Mar. 1986.}
    255. [Winslett;86c]* {Marianne Winslett: ``Is Belief Revision Harder Than You Thought?''; AAAI-86, Philadelphia, Aug. 1986.}
    256. [Winslett;86d]* {Marianne Winslett: ``Updating Logical Databases Containing Null Values''; Intl. Conf. on Database Theory, Rome, Sept. 1986.}
    257. [Winslett;87a]* {Marianne Winslett: ``Updating Databases with Incomplete Information''; PhD thesis, Department of Computer Science, Stanford University, January 1987.}
    258. [Winslett;87b]* {Marianne Winslett: ``A Model-Based Approach to Updating Databases with Incomplete Information''; submitted for publication.}
    259. [Winslett;89]* {Marianne Winslett, David Knapp, Keith Hall, and Gio Wiederhold: ``Use of Change Coordination in an Information-Rich Design Environment''; ACM Design Automation Conference, 1989.}

    The following papers relevant to, but not created by KBMS research

    1. [Bachman;72] {C.W. Bachman: ``The Evolution of Storage Structures''; CACM, vol.15 no.7, July 1972, pp.628-634.}
    2. [Bancilhon;81] {F.B. Bancilhon and N. Spryatos: ``Update Semantics of Relational Views''; ACM TODS, vol.6 no.4, Dec.1981, pp.557-575.}
    3. [Barbera;82] {D. Barbera and H. Garcia-Molina: ``How Expensive is Data Replication''; Proceedings of the Third International Conference on Distributed Computing Systems, pp.263-268, Oct.1982.}
    4. [Barr;81] {A. Barr, E. Feigenbaum, and P. Cohen (eds.): The Handbook of Artificial Intelligence, vol. I; Kaufman Publishers, Los Altos CA, 1981}
    5. [Belady;77] {L.A. Belady, P.M. Merlin: ``Evolving Parts and Relations - A Model of System Families''; IBM Research Report RC6677, Aug.1977.}
    6. [Bennett;82] {Jack Bennett: ``A Database Management System for Design Engineers''; ACM, IEEE Nineteenth Design Automation Conference Proceedings, June 14-16, 1982, pp.268-273.}
    7. [Blasgen;77] {M. Blasgen and K. Eswaren: ``Storage and Access in Relational Data Bases''; IBM Systems Journal, vol.16 no.4, 1977.}
    8. [Bleier;67] {R.E. Bleier: ``Treating Hierarchical Data Structures in the SDC Time-shared Data Management System (tdms)''; Proceedings of the 22nd ACM National Conference 1967, MDI Publishers, Wayne PA, pp.41-49.}
    9. [Blum;83] {Robert L. Blum: ``Clinical Decision Making Aboard the Starship Enterprise''; SCAMC 83, IEEE, April 1983.}
    10. [Boyce;75] {R. Boyce, D.Chamberlin, M. Hammer and W. King: ``Specifying Queries as Relational Expressions: The SQUARE Data Sublanguage''; CACM, vol.18 no.11, Nov.1975, pp.621-628.}
    11. [Brachman;83] {Ronald J. Brachman, R.E. Fikes, and H.J. Levesque: ``krypton: A Functional Approach to Knowledge Representation''; IEEE Computer, vol.16 no.10, Oct.1983, pp.67-73.}
    12. [Burkhard;76] {W.A. Burkhard: ``Hashing and Trie Algorithms for Partial Match Retrieval''; ACM TODS, vol.1 no.2, June 1976, pp.175-187.}
    13. [Ceri;84] {S. Ceri and G. Pelagatti: Distributed Databases: Principles and Systems; McGraw-Hill, 1984.}
    14. [Ceri;86] {S. Ceri and G. Gottlob: ``Normalization of Relations and Prolog''; submitted for publication, 1986.}
    15. [Chamberlin;74] {D.D. Chamberlin and R.F. Boyce: ``SEQUEL: A Structured English Query Language''; Proceedings of the ACM-SIGMOD Workshop on Data Description, Access and Control, May 1974, pp.249-264.}
    16. [Chang;76] {C.L. Chang: ``DEDUCE: A Deductive Query Language for Relational Data Bases''; in: Pattern Recognition and Artificial Intelligence, Academic Press, 1976, pp.108-134.}
    17. [Chang;76] {S. Chang, M. O'Brien, J. Read, A. Borovec, W. Cheng, and J.S. Ke: ``Design Considerations of a Data Base System in a Clinical Network Environment''; Proceedings of the 1976 NCC, AFIPS Press, Arlington VA, 1976, vol.45, pp.277-286.}
    18. [Chen;77] {P.P. Chen: ``The Entity-Relationship Model - A Basis for the Enterprise View of Data''; Proceedings of the 1977 NCC, AFIPS Press, Arlington VA, 1977, vol.46, pp.77-84.}
    19. [Cherubini;84] {R. Cherubini: ``x9: A System for Creating and Delivering On-line Documentation''; Internal Memorandum, Digital Equipment Corporation, 1984.}
    20. [Cline;85] {T. Cline, W. Fong, M.G. Walker, and S. Rosenberg.: ``Photolithography Advisor''; SIGART Newsletter, April 1985, no.92, p.42, Special Section on AI in Engineering, Sriram and Joobani(eds.).}
    21. [CODASYL;71] {Data Base Task Group of CODASYL Programming Language Committee; ACM report, April 1971}
    22. [CODASYL;74] {CODASYL Data Description Language, Journal of Development June 73; NBS Handbook 113, Government Printing Office, Washington DC, Jan.1974, 155pp.}
    23. [Codd;70] {E.F. Codd: ``A Relational Model of Data for Large Shared Data Banks''; CACM, vol.13 no.6, June 1970, pp.377-387.}
    24. [Codd;71] {E.F. Codd: ``A Data Base Sublanguage Founded on the Relational Calculus''; Proceedings of the ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, Nov.1971, pp.35-68.}
    25. [Codd;74] {E.F. Codd: ``Seven Steps to Rendezvous with the Casual User''; in: Data Base Management, Klimbie and Koffeman(eds.), North Holland, 1974, pp.179-200.}
    26. [Cohen;74] {S.N. Cohen et al: ``Computer Monitoring and Reporting of Drug Interactions''; Proceedings of MEDINFO 1974, IFIP Congress, Anderson and Forsythe, (eds.), North-Holland, 1974.}
    27. [Duda;78] {R.O. Duda, P.E. Hart, P. Barrett, J.G. Gaschnig, K. Konolidge, R. Reboh, and J. Slocum: Development of the Prospector Consultation System for Mineral Exploration; Final Report, SRI project 5821 and 6415, Oct.1978.}
    28. [ElMasri;84] {Ramez ElMasri and Shamkant Navathe: ``Object Integration in Database Design''; IEEE Data Engineering Conference, Los Angeles, Apr.24-27, 1984.}
    29. [Everest;74] {Gordon C. Everest: ``The Objectives of Data Base Management''; Proceedings of the 4th International Symposium on Computer and Information Sciences, Plenum Press, 1974, pp.1-35.}
    30. [Finkelstein;82] {Sheldon J. Finkelstein, Mario Schkolnick, and Paul Tiberio: A Physical Database Design Tool for Relational Databases; IBM Research Report, 1982.}
    31. [FCWR;85] {W. Fong, T. Cline, M.G. Walker, and S. Rosenberg: ``An Expert System for Photolithography''; Semicon East, October 1985.}
    32. [Feigenbaum;88] {E. Feigenbaum, P. Nii, and P. McCorduck: The rise of the expert company: How visionary companies are using artificial intelligence to achieve higher productivity and profits; Times Books, 1988.}
    33. [Fries;75]* {James F. Fries, Alison Harlow, and Gio Wiederhold: ``Experience-based Computer Consultation''; IEEE Cybernetic Systems, Sept. 1975.}
    34. [Fishman;87] {Fishman et al., ``Iris: An Object-Oriented DBMS''; ACM TOOIS, v5, n1, pp 48-69, January 1987.}
    35. [Fries;79] {James F. Fries and Dennis McShane: ``aramis, A National Chronic Disease Databank System''; Proceedings of the 3rd Symposium on Computer Applications in Medical Care, Oct.1979, Washington DC, IEEE, pp.798-801.}
    36. [Garcia;77] {H. Garcia-Molina: ``Overview and Bibliography of Distributed Databases''; Computer Science Department, Stanford University.}
    37. [GarciaMolina;81] {H. Garcia-Molina: Using Semantic Knowledge For Transaction Processing in Distributed Databases; Techical Report 285, Dept. of Electrical Engineering and Computer Science, Princeton University, April 1981.}
    38. [Germano;80] {F. Germano: Automatic Transaction Decomposition in a Distributed CODASYL Prototype System; PhD. Thesis, Wharton School, University of Pennsylvania, 1980.}
    39. [Ghosh;75] {S.P. Ghosh: ``Consecutive Storage of Relevant Records with Redundancy''; CACM, vol.16 no.8, August 1975, pp.464-471.}
    40. [Gilbert;82] {E.J. Gilbert: Algorithm Partitioning Tools for a High-Performance Multiprocessor; PhD. Thesis, Stanford University, Dec.1982.}
    41. [Grosz;77a] {B.J. Grosz: The Representation and Use of Focus in Dialog Understanding; PhD. Thesis, The University of California at Berkeley, June 1977.}
    42. [Grosz;77b] {B.J. Grosz: ``The Representation and Use of Focus in a System for Understanding Dialogs''; Proceedings of the 5th IJCAI, Cambridge MA, 1977, vol.1, pp.67-76.}
    43. [Hammer;76] {M.M. Hammer and A .Chan: ``Index Selection in a Self-Adaptive Database Management System''; Proceedings of the ACM-SIGMOD Conference, Washington DC, June 1976, pp.1-8.}
    44. [Hammer;77] {M.M. Hammer: ``Self-Adaptive Automatic Data Base Design''; Proc AFIPS 1977 NCC, pp.123-129.}
    45. [Hammer;78] {M.M. Hammer and D. McLeod: ``The Semantic Data Model, a Modelling Mechanism for Database Applications''; ACM-SIGMOD International Conference on Management of Data, Austin TX, 1978, pp.26-36.}
    46. [Hammer;80] {M.M. Hammer and D. Shipman: ``Reliability Mechanisms for SDD-1''; ACM TODS, vol.5 no.4, Dec.1980, pp.431-466.}
    47. [Harris;77] {L.R. Harris: ``robot: A High Performance Natural Language Processor for Data Base Query''; ACM-SIGART Newsletter, no.61, Feb.1977, pp.39-40.}
    48. [Hayes;85] {Barabara Hayes-Roth: ``BB1: An Architecture for Blackboard Systems that Control, Explain, and Learn About Their Own Behaviour''; Stanford University Report STAN-CS-84-1034, December 1984; AI Journal, 1985.}
    49. [Heimbigner;80] {D. Heimbigner, and D. McLeod: ``A Federated Architecture for Database Systems''; Proceedings of the National Computer Conference, Anaheim CA, June 1980.}
    50. [Held;76] {G.D. Held, M.R. Stonebraker and E. Wong: ``INGRES: A Relational Data Base System''; AFIPS Conference Proceedings, vol.44, 1975, pp.416.}
    51. [Hendrix;75] {G.G. Hendrix: Expanding the Utility of Semantic Networks through Partitioning; SRI Artificial Intelligence Group Technical Note 105, June 1975.}
    52. [Hendrix;77] {G.G. Hendrix: ``Human Engineering for Applied Natural Language Processing''; Proceedings of the 5th IJCAI, Cambridge MA, 1977, vol.1, pp.183-191.}
    53. [Hendrix;80] {G.G. Hendrix: ``Mediating the Views of Databases and Database Users''; ACM Pingree Park Workshop, Brodie(ed.), Jun.1980, pp.131-132.}
    54. [Hodges;83] {A. Hodges: Alan Turing: The Enigma; p.361, Simon and Schuster, 1983.}
    55. [Horvitz;86] {E.J. Horvitz and D.E. Heckerman: ``Modular belief updates and the inconsistent use of measures of certainty in artificial intelligence research''; (Memo KSL-85-57); to appear in Uncertainty in Artificial Intelligence, Amsterdam: North-Holland, 1986.}
    56. [Jonsson;86] {B. Jonsson, Z. Manna and R. Waldinger: ``Towards Deductive Synthesis of Dataflow Networks''; IEEE Conference on Logic in Computer Science, Cambridge, Mass., June, 1986.}
    57. [Kaplan;79] {S.J. Kaplan: Cooperative Responses from a Portable Natural Language Data Base Query System; PhD. Dissertation, Dept. of Computer and Information Science, University of Pennsylvania, Philadelphia PA, 1979.}
    58. [Katz;84] {R.H. Katz and T.J. Lehman: ``Database Support for Versions and Alternatives of Large Design Files''; IEEE Transactions on Software Engineering, vol.SE-10 no.2, Mar.1984.}
    59. [Knuth;73] {D.E. Knuth: The Art of Computer Programming; vol.3: Sorting and Searching; Addison-Wesley, 1973.}
    60. [LaFue;82] {Gilles M.E. LaFue: ``Semantic Integrity Dependencies and Delayed Integrity Checking''; Very Large Data Bases 8, McLeod and Villasenor(eds.) 1982, pp.292-299.}
    61. [LaFue;83] {Gilles M.E. LaFue: ``Basic Decisions about Linking an Expert System with a DBMS: A Case Study''; IEEE Database Engineering Bulletin, vol.6 no.4, Dec.1983, pp.56-64; reprinted in IEEE Database Engineering, vol.2, 1984 Kim, Ries, Lochovsky(eds.), pp.238-246.}
    62. [LaFue;85] {G.M.E. LaFue and R. Smith.: `` Implementation of A Semantic Integrity Manager With a Knowledge Representation System ''; EDBS 1, vol.1, Oct.1983, pp.172-185.}
    63. [Lee;78] {R.M. Lee: ``Conversational Aspects of Database Interactions''; Conference on Very Large Data Bases, Berlin, Germany, 1978.}
    64. [Lehnert;77] {W. Lehnert: ``Human and Computational Question Answering''; Cognitive Science vol.1 no.1, 1977.}
    65. [Lenat;87] {D. Lenat and E. Feigenbaum: ``On the Thresholds of Knowledge''; IJCAI 87, Milan, Italy, 1987.}
    66. [Litwin;86a] {Witold Litwin and Pierre Vigier: ``Dynamic Attributes in the Multidatabase System MRDSM''; IEEE Data Engineering Conf.2, Los Angeles, Feb.1986.}
    67. [LitwinA;86b] {W. Litwin and A. Abdellatif: ``Multidatabase Interoperability''; IEEE Computer, Vol.19 No.12, Dec.1986, pp.10-18.} <%= Federated system compatibility.>
    68. [LitwinA;87] {W. Litwin and A. Abdellatif: ``An Overview of the Multi-Database Manipulation Language MDSL''; Proc. IEEE, Vol.75 No.5, May.1987, pp.621-632.}
    69. [Litwin;88] {Witold Litwin: ``From Database Systems to Multidatabase Systems: Why and How''; BNCOD, 1988, Cambridge Press.}
    70. [Lucas;75] {H.C. Lucas: ``Performance and Use of an Information System''; Management Science; vol.21 no.8, April 1975, pp.908-919.}
    71. [MacKinlay;83] {J. MacKinlay: `` Intelligent Presentation: The Generation Problem for User Interfaces''; Stanford University Heuristic Programming Project Tech. Report, March 1983.}
    72. [MacKinlay;86] {J. MacKinlay: ``Using Composition to Design Presentations Automatically''; Stanford University Computer Science Dept. Tech. Report, March 1986.}
    73. [Malone;84] {T.W. Malone and S.A. Smith: Tradeoffs in Designing Organizations: Implications for Human Organizations and Computer Systems; MIT, Sloan School of Management, WP No.1541-84, 1984.}
    74. [Malone;87] {T.W. Malone, K.R. Grant, ,F.A. Turbak, ,S.A. Brobst, and M.D. Cohen: ``Intelligent Information-Sharing Systems''; CACM, Vol.30 No.5, May 1987, pp.390-402.}
    75. [Manna;86] {Z. Manna and R. Waldinger: ``How to Clear a Block: Plan Formulation in Situational Logic''; Confence on Automated Deduction, 1986.}
    76. [Marill;75] {T. Marill and D. Stern: ``The Datacomputer, A Network Data Utility''; Proceedings of the 1975 NCC, AFIPS, vol.44, pp.389-395.}
    77. [Martin;77] {N. Martin, P. Friedland, J. King, and M.J. Stefik: ``Knowledge Base Management for Experiment Planning''; Stanford University, Heuristic Programming Project Paper HPP-77-19 Report, August 1977.}
    78. [McCune;85] {B.P. McCune, R.M. Tong, J.S. Dean, and D.G. Shapiro: ``rubric: A System for Rule-based Information Retrieval''; IEEE TSE, Vol.SE-11 No.9, Sep.1985, pp.939-945}
    79. [McDermott;75] {D.V. McDermott: Very Large Planner-type Data Bases; M.I.T. Artificial Intelligence Laboratory Memo 339, September 1975.}
    80. [Montgomery;72] {C.A. Montgomery: ``Is Natural Language an Unnatural Query Language''; Proceedings of the 27th Conference, ACM, 1972, pp.1075-1078.}
    81. [Moore;78] {R.C. Moore: Handling Complex Queries in a Distributed Data Base; Technical Report, Artificial Intelligence Center, SRI International, 1978.}
    82. [Mylopoulos;75] {J. Mylopoulos and N. Roussopoulos: ``Using Semantic Networks for Data Base Management''; Conference on Very Large Data Bases, Framingham MA, Sept.1975, pp.144-172.}
    83. [Mylopoulos;85] {J. Mylopoulos, A. Borgida., S. Greenspan, and H.K.T. Wong: `` Information System Design at the Conceptual Level - The taxis Project''; IEEE Database Engineering Bulletin, vol.7 no.4, Dec.1984, pp.4-9.}
    84. [Nakane;86]* {Yasuaki Nakane, Toshiro Morimoto, and Tadashi Otsuki: ``Optical Write-once Disk Subsystem with Higher Data Integrity''; 1986 (in Japanese)}
    85. [Navathe;82] {Shamkant B. Navathe and S.G. Gadgil: ``A Methodology for View Integration in Logical Database Design''; Conference on Very Large Data Bases 8, McLeod and Villasenor(eds.), Mexico City, Sep.1982, pp.142-164.}
    86. [Nava;84b] {S.B. Navathe, T. Sashidhar, and R. ElMasri: ``Relationship Merging in Schema Integration''; Proceedings of the 10th VLDB Conference, Singapore, Aug.1984.}
    87. [Nii;86] {H.P. Nii:``Blackboard Model of Problem Solving and the Evolution of Blackboard Architectures''; AI Mag., Vol. 7 No. 2, Sum. 1986, pp.38-53.}
    88. [Novak;76] {G.S. Novak: ``Computer Understanding of Physics Problems Stated in Natural Language''; American Journal of Computational Linguistics, Microfiche 53, 1976.}
    89. [Parnas;76] {David L. Parnas: ``On the Design and Development of Program Families''; IEEE Transactions on Software Engineering, vol.SE-2 no.1, March 1976.}
    90. [Patel-Schneider;84] {Peter F. Patel-Schneider, Hector J. Levesque, and Ronald J. Brachman: argon, Knowledge Management meets Information Retrieval; Technical Report, Fairchild Laboratory for AI Research, Palo Alto CA, received May 1984.}
    91. [Petrick;76] {S.R. Petrick: ``On Natural Language Based Computer Systems''; IBM Journal of Research and Development, vol.20 no.4, July 1976.}
    92. [Reddy;76] {Raj Reddy, L. Erman, R. Fennel, and R. Neely: ``The hearsay Speech Understanding System: An Example of the Recognition Process''; IEEE Transactions on Computers, vol.C-25, pp.427-431.}
    93. [Rivest;76] {R.L. Rivest: ``On Self-organizing Sequential Search Heuristics''; CACM, vol.19 no.2, February 1976.}
    94. [Rochkind;75] {Marc J. Rochkind: ``The Source Code Control System''; IEEE Transactions on Software Engineering, vol.SE-1 no.4, Dec.1975.}
    95. [Roussopoulos;77] {N. Roussopoulos: A Semantic Network Model of Databases; PhD. Thesis, Computer Science Department, University of Toronto, 1977.}
    96. [Rowe;76] {Neil Rowe: ``Grammar as a Programming Language''; Creative Computing, January-February 1978, pp.80-86.}
    97. [Rowe;79b] {Neil Rowe: ``Sine POLYS: Some Geometrical Explorations''; Creative Computing, December 1979, pp.92-95.}
    98. [Rowe;80a] {Neil Rowe: ``Inductive Common Ground Tutoring''; Computers and Education, vol.4 no.2, 1980, pp.177-187.}
    99. [Rowe;80b] {Neil Rowe: ``Property List Structures''; Creative Computing, October 1980, pp.126-130.}
    100. [Rowe;81b] {Neil Rowe: ``Some Rules for Good Simulations''; Educational Computer, November-December 1981, pp.37-40.}
    101. [Rowe;82g] {Neil C. Rowe: ``More POLYS''; Creative Computing, April 1982.}
    102. [Sacerdoti;77a] {E.D. Sacerdoti: ``Language Access to Distributed Data With Error Recovery''; Proceedings of the 5th International Joint Conference on Artificial Intelligence, Cambridge MA, Aug.1977.}
    103. [Sacerdoti;77b] {E.D. Sacerdoti: A Structure for Plans and Behavior; Elsevier North-Holland, 1977.}
    104. [Sagalowicz;77] {D. Sagalowicz: ``IDA: An Intelligent Data Access Program''; Conference on Very Large Data Bases 3, Tokyo, Japan, Oct. 1977.}
    105. [Schkolnick;77] {M. Schkolnick: ``A Clustering Algorithm for Hierarchical Structures''; ACM TODS, vol.2 no.1, March 1977, pp.27-44.}
    106. [Selinger;79] {P. Selinger et al: Access Path Selection in a Relational Database Management System; IBM Technical Report RJ2429, 1979.}
    107. [Shipman;81] {D.W. Shipman: ``The Functional Data Model and the Data Language daplex''; ACM TODS, vol.6 no.1, Mar.1981, pp.140-173.}
    108. [Shneiderman;73] {B. Shneiderman: ``Optimum Data Base Reorganization Points''; CACM, vol.16 no.6, June 1973, pp.362-365}
    109. [Shortliffe;73] {E. Shortliffe, S.G. Axline, B.G. Buchanan, T.C. Merigan and S.N. Cohen: ``An Artificial Intelligence Program to Advise Physicians Regarding Antimicrobial Therapy''; Computers and Biomedical Research, vol.6 no.6, Dec.1973, pp.544-560.}
    110. [Shortliffe;76] {E.H. Shortliffe: Computer-Based Medical Consultations: MYCIN; American Elsevier, 1976.}
    111. [Shortliffe;79] {E.H. Shortliffe, B.G. Buchanan, and E.A. Feigenbaum: ``Knowledge Engineering for Medical Decision Making: A Review of Computer-Based Decision Aids''; Proceedings of the IEEE, vol.67 no.9, 1979, pp.1207-1223.}
    112. [Sibley;76] {E.H. Sibley and J.P. Fry: ``Evolution of Data Base Management Systems''; ACM Computing Surveys, vol.8 no.1, March 1976, pp.7-42}
    113. [Smith;77] {J.M. Smith and D.C.P. Smith: ``Database Abstractions: Aggregation and Generalization''; ACM TODS, vol.1 no.1, Jun.1977, pp.105-133.}
    114. [Snodgrass;87] {R. Snodgrass: ``The Temporal Query Language TQuel''; ACM TODS, Vol.12 No.2, June 1987, pp.247-298.}
    115. [Sowa;76] {J.F. Sowa: ``Conceptual Graphs for a Data Base Interface''; IBM Journal of Research and Development, vol.20 no.4, July 1976, pp.336-357.}
    116. [Sprague;86] {R.H. Sprague and B.C. McNurlin: Information Systems Management in Practice; Prentice-Hall, 1986, 501pp.}
    117. [Steel;74] {T.B. Steel, Jr.: ``Data Base Systems - Implications for Commerce and Industry''; in: Data Base Management Systems, Jardine, ed., North Holland 1974.}
    118. [Steel;75] {T.B. Steel, Jr.: ANSI/X3/SPARC Study Group on Data Base Management Systems, Interim Report 75-02-08; FDT (ACM-SIGMOD), vol.7 no.2, 1975.}
    119. [Stocker;77] {P.M. Stocker: ``The Structuring of Data Bases at the Implementation Level''; in Architecture and Models in Data Base Management Systems, Nijssen, (ed.), North-Holland, 1977, pp.261-276.}
    120. [Taylor;74] {R.W. Taylor: ``When Are Pointer Arrays Better Than Chains''; Proceedings of the 1974 National Conference, Nov.1974.}
    121. [Thompson;75] {F.B. Thompson and B.H. Thompson: ``Practical Natural Language Processing: the REL System as Prototype''; in: Advances in Computers 13, Rubinoff and Yovits, (eds.), Academic Press, 1975, pp.109-168.}
    122. [Thompson;83] {Craig W. Thompson, Kenneth M. Roth, Harry R. Tennant, and Richard M. Saenz: ``Building usable menu-based natural language interface to databases} in 9th Conf. on VLDB, pp.43-55, 1983.}
    123. [Tuel;78] {W.G. Tuel: \article{Reorganization Points for Linearly Growing Files''; ACM TODS, vol.3 no.1, Sept.1978, pp.32-40.}
    124. [Walker;77] {D.E. Walker et al.: ``An Overview of Speech Understanding Research at SRI''; Proceedings of the 5th International Joint Conference on Artificial Intelligence, Cambridge MA, Aug.1977.}
    125. [Waltz;75] {D.L. Waltz: ``Natural Language Access to a Large Database: An Engineering Approach''; Proceedings of the 4th International Joint Conference on Artificial Intelligence, Tbilisi USSR, Sept.1975, pp.868-872.}
    126. [Warren;81] {David H.D. Warren: ``Efficient Processing of Interactive Relational Databases Queries Expressed in Logic''; VLDB 7, Zaniola and Delobel (eds.), Cannes, France, Sep.1981, pp.272-281.}
    127. [Warren;82] {David H.D. Warren and Fernando C.N. Pereira: ``An Efficient Easily Adaptable System for Interpreting Natural Language Queries''; American Journal of Computational Linguistics, vol.8 no.3-4, July-Dec.1982, pp.110-122.}
    128. [Webber;84] {Bonnie Webber and T. Finin: ``In Response: Next Steps in Natural Language Interaction''; Artificial Intelligence Applications for Business, Reitman (ed.), Ablex 1984, pp.211-234.}
    129. [Weyl;75]* {S. Weyl, J. Fries, G. Wiederhold, and F. Germano: ``A Modular Self-Describing Databank System''; Computers and Biomedical Research, vol.8, 1975, pp.279-293.}
    130. [Wiederhold;75]* {G. Wiederhold, J.F. Fries and S. Weyl: ``Structured Organization of Clinical Data Bases''; Proceedings of the 1975 NCC, AFIPS vol.44, pp.479-486.}
    131. [Wiederhold;77] {Gio Wiederhold: Database Design; McGraw-Hill, 1977.}
    132. [Woods;73] {W.A. Woods: ``Progress in Natural Language Understanding, An Application to Lunar Geology''; Proceedings of the 1973 NCC, AFIPS, vol.42, pp.441-450.}
    133. [Yao;76] {S.B. Yao, K.S. Das, and T.J. Teory: ``A Dynamic Database Reorganization Algorithm''; ACM TODS, vol.1 no.2, June 1976, pp.159-174.}
    134. [Yao;78] {S.B. Yao and D. DeJong: ``Evaluation of Database Access Paths''; Proceedings of the ACM-SIGMOD, Austin TX, May-June 1978, pp.66-77.}
    135. [Yao;79] {S.B. Yao: ``Optimization of Query Evaluation Algorithms''; ACM TODS, vol.4 no.2, Jun.1979, pp.133-155.}
    136. [Zloof;75] {M. Zloof: ``Query by Example''; Proceedings of the 1975 NCC, AFIPS vol.44, 1975, pp.431-438.}

    Stuff to take care of

    ---------------------------------
  • % Shafner-NOT FOUND
  • % Shortliffe, 80
  • % Intellicorp-NOT FOUND
  • % XEROX-NOT FOUND

    Request from Gio, 5-12-86: I am happy to tell you that my paper "Subsumption and Implication", written last summer at Stanford and sponsered by DARPA Contract N39-84-C-211, is accepted for publication in Information Processing Letters. George Gottlob.
    We should probably start A KSYS section " A system to experiment with Knowledge-base- database interaction>> We have performed some experiments using PROLOG, to develop and validate methods to acquire database information into a knowledge base. We consider ground rules in PROLOG and bring them, before or during execution, into the knowledge base. A `Tracker' records the portions placed in the knowledgebase so that repeated database access are avoided. When all data relating to a predicate have been placed all further accesses are avoided by inserting a negative rule. Other reference is paper by Ceri, Gottlob and me, with PROLOG in title. I assume its in and not referenced now.