A goal in information systems interoperability research is tight coupling: to be able to construct from a collection of databases a system which appears to the user as a single database, seamlessly integrating the component databases. The critical problem in achieving interoperability is semantic heterogeneity. This paper argues from the perspective of text databases that the conditions under which semantic heterogeneity can be overcome are very special. Further, the operational conditions under which tight coupling can be achieved are also very special. As a consequence, the goal of tight coupling is often not possible, and sometimes not even desirable. However, a limited tight coupling can be achieved by following the exchanges of information among organizations in their normal course of business, although the resulting network of limited tight couplings do not necessarily result in a useful global schema.
In the world of networks today it seems necessary to have some ways of establishing how high the cost of gaining information about specific subjects may be and to be able to weigh whether the 'worth' of the knowledge relative to the 'cost' of the knowledge is high enough for it to be taken into account and acquired. For this I propose that it is necessary to have some way to calculate this relation and to keep the information on hand for future Knowledge Acquisition wishes. In sthe sense of it being unecessary to keep all knowledge on hand locally, unless the third weight, which should be calculated, the 'frequency' of needed access to this information is sufficiently relevant in comparison to the weight versus the cost that it may make it necessary. Further weights which should be taken into account are: the persistence of the information in the distributed sources as well as the rate of change in the information. The process of acquiring all the relevant weights are a part of the process of knowledge abstraction.
We are considering doing a paper relating the use of I3 in a
project called Dual-use Technology Insertion Decision Support System
(DTIDS), a project we are doing for the Air Force related to the
125,000 Air Force ground Vehicle Fleet, the move to integrate that
fleet as quickly as possible to alternate fueled vehicle (AFV)
scenarios, and the complexity of managing the information for
acquisition, technology insertion, system performance measures to name
a few. AFV System Program Manager, Carl Perazzola from Warner Robins
AFB, is very tied into ARPA projects in support of his interests, and
is using our Program Management skills and technologies to extend his
own management capability. I have seen a natural fit for I3, and have
asked Sham Navathe and his people to tune their work in support of
this effort. I have also appraised Dave Gunning of the situation, and
we are considering a I3 Workshop for this Program at Warner Robins in
the near future.
There's just a little feedback for you on what I consider to be a great
transition of technology from I3 to a very large Air Force need.
Beyond ground vehicles, my Section, who are primarily interested in
Embedded Aircraft Applications and their support environments, are
looking for applications of I3 in these areas as well. I know from
its inception that I3 addresses issues related to the F-22. My
organization, at the Directorate Level, Wright Laboratory Avionics
Directorate in particular under Dr. Jesse Ryles and deputy Colonel
Lewantowicz, are very interested in Technology Fusion. All of the
programs under the Directorate are being tasked at looking at key
technologies that are Fusion in nature.
We believe, that the I3 technologies are definately key Fusion
Technologies. We request your support in building this case, and
potentially moving I3 more aggressively into avionics applications.
That starts with what you are so good at, dialogue and suggestions for
colaborations.
The paper presents an approach to resolving semantic heterogeneity in multidatabases which relies upon a Knowledge Based Ontology. The ontology is built around a core of about 250 concepts to which domain specific layers are added as data sources about those domains are required. The terms in the ontology are used to express queries in a data source independent, first order logic representation language with constrained second order extensions. Data sources are also described within this representation language. Queries are expanded and normalised on the basis of user and task models before both simple and heuristic matching is performed between queries and data source descriptions in order to identify target data sources and local identifiers for the query variables. Following this conceptual interpretation, queries for each target data source are constructed at a logical level, where processing constraints (time and cost of retrieval itself) are applied to optimise performance. After queries are dispatched and data returned, remaining conflicts are resolved, data are normalised, and they are integrated into HyTime templates which grow a hypermedia web for the presentation of retrieved multimedia data to the user. The architecture of the Multimedia Information Presentation System (MIPS) which has implemented this approach is described along with discussions of the relative complexity of both the implemented system, and the supporting methods to use and maintain it. Measures of performance show the relative runtime cost of each part of the algorithm. In particular the tradeoffs made between end user and system support effort, and between run-time and off-line effort are debated for such a general and therefore evolving system.
With multiple heterogeneous information sources accessible in modern networked environments, users and applications cannot be expected to keep up with all their specific languages, organizations, contents and network locations. Yet the need for the data available from these sources remains. Users and applications would benefit greatly from the ability to formulate queries in a language that is free from any reference to specific information sources. SIMS is a system that supports such querying. This paper describes how SIMS can reformulate a high-level query that describes only the desired information into a query that makes explicit the information sources that need to be accessed and the data that needs to be retrieved from each. To perform this task SIMS uses models, declarative descriptions of the application domain and the available information sources, and reformulation operators that are used to successively rewrite portions of the query until all needed information sources are made explicit. This approach provides a flexible and extensible system for integrating heterogeneous information sources. We have demonstrated the feasibility and effectiveness of this approach by applying SIMS in the domains of transportation planning and medical trauma care.
A new generation of information systems that integrates knowledge base
technology with database systems is presented for providing
cooperative (approximate, conceptual, and associative) query
answering. Based on the database schema and application
characteristics, data are organized into Type Abstraction Hieararchies
(TAHs). The higher levels of the hierarchy provide a more abstract
data representation than the lower levels. Generalization (moving up
in the hierarchy), specialization (moving down the hierarchy), and
association (moving between hierarchies) are the three key operations
in deriving cooperative query answers. Based on the context, the TAHs
can be constructed automatically from databases. There is an
intelligent dictionary/directory in the system that lists the location
and characteristics such as context and user type of the TAHs for the
system and user to select the appropriate one for relaxation. A
knowledge editor is also provided to browse and edit the TAHs as well
as the relaxation parameters. CoBase also provides a relaxation
manager to provide control for query relaxations. An explanation
system is also included to describe the relaxation and association
processes and provide the quality of the relaxed answers.
CoBase uses a mediator architecture to provide scalability and
extensibility. Each cooperative module, such as relaxation,
association, explanation, and TAH manager, is implemented as a
mediator. Further, an intelligent directory mediator is provided
to direct mediator requests to the appropiate service mediators.
CoBase mediators have uniform interface specifications and are
interconnectable to perform joint tasks. Mediators communicate
with each other via KQML. A CoBase ontology for KQML is
presented in the paper.
A Geographical Information System is developed on top of CoBase.
Queries can be specified graphically and incrementally on maps,
greatly improving querying capabilities. Further, the relaxation
process can be visualized graphically on the map. For ease in
technology transfer, CoBase was implemented in C++ and uses CLIPS for
knowledge representation. CoBase has been demonstrated for answering
imprecise queries for transportation application and for matching
medical image (X-ray, MRI) features. It has also been used for schema
integration of heterogeneous databases, and for matching radar emitter
signals to locate platforms. Measurements as well as metrics
regarding scalability and extensibility will also be presented.
Vertical information management (VIM) supports decision makers
working within various levels of a management hierarchy, seeking
information from potentially large, distributed, heterogeneous, and
federated information sources. Decision makers are often overwhelmed
by the volume of data which may be relevant and collectible, but
overly detailed (e.g., from the breadth of open source data).
Yet, the collected information must maintain its pedigree to allow
access to detail on demand. VIM structures a top-down query refinement
and bottom-up information collection process. Our approach explicitly
represents the abstract solution which is used to generate a
representation-dependent solution to the information request. One
assumption of this work is that high-level information requests may
involve data that is extracted or derived from underlying information
sources, as well as data that is not present in the underlying
information sources (referred to as "gaps"). For a high-level
information request to be issued, a more detailed specification using
the representation-dependent components of the framework must
be utilized.
The VIM framework has been developed in the context of the ARPA I3
program in that we provide:
i) support for access to independent, heterogeneous information
sources without the use of a complete global schema [1, 7],
ii) query formulation services [7],
iii) a recognized manual element for integration of underlying
data [2, 6], and iv) operation in the context of incomplete
or missing data [5]. However, the scope of our work is more narrow
than the scope of the ARPA I3 program; we focus on supporting
read-only access to the underlying sources, and on capturing the
process used to derive the high-level information, as opposed to
focusing on the automated delivery of information.
We propose to use a collection of mediators
from the I3 program [3, 6] and from ongoing work on context
interchange [4] to accomplish the semantic and representational
resolution between the data of interest, and the data contained in the
particular underlying data sources. Contributions of
the VIM framework include:
o separation of semantics from representation; a high-level
information request, and the way it is to be constructed from
base data, may be specified without the burden of the
representational detail for the actual underlying data, and
without being limited to the data directly stored in the
underlying information sources,
o reusability of high-level information requests against different
underlying data sources,
o provision of an elegant interface between the data integration
problem and the information derivation problem; this makes the
larger problem of providing decision makers with useful
information more understandable.
o composability allows existing specifications to be pieced
together for increasingly complex requests for information, and
o derivation of defensible data.
This work is motivated by the demands for arbitrary high-level
information about environmental restoration and remediation on a
regional and national scale. This paper will describe the VIM
framework with a focus on specification of the steps for derivation
of the high-level information from base data in terms
of both the abstract, representation-independent and the
representation-based components. We will also describe the prototype
for specification and execution of high-level information requests.
References
[1]Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K.,
Papakonstantinou, Y., Ullman, J., and Widom, J. (1994).
"The TSIMMIS Project: Integration of Heterogeneous Information
Sources" in Proceedings of the 100th IPSJ Anniversary meeting,
Tokyo, Japan.
[2]Papakonstantinou, Y., Garcia-Molina, H., and Widom, J. (1995).
"Object Exchange Across Heterogeneous Information Sources" to
appear in ICDE 95.
[3]Papakonstantinou, Y., Garcia-Molina, H., and Ullman, J. (1995).
"MedMaker: A Mediation System Based on Declarative Specifications
(Extended Version)" submitted for publication 1995, available at
http://www-db.stanford.edu/ yannis/yannis-papers.html.
[4]Sciore, E., Siegel, M., and Rosenthal, A. (1994) "Using Semantic
Values to Facilitate Interoperability Among Heterogeneous
Information Systems", ACM Transactions on Database Systems
vol. 19-2.
[5]Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., and Widom, J.
(1994). "Querying Semistructured Heterogeneous Information"
unpublished memorandum, CSD, Stanford U,
anonymous ftp as pub/quass/1994/querying-submit.ps.
[6]Wiederhold, G. (1992). "Mediators in the Architecture of Future
Information Systems" IEEE Computer vol. 25-3, pp. 38-49.
[7]"Reference Architecture" a draft developed in the
November '94, ARPA Intelligent Integration of Information
Workshop, available at
http://www.cs.colorado.edu/ dbgroup/i3-ref-arch.html.
The need for mappings between different data representations is widespread in the domain of Computer-Aided Design (CAD). Mapping is required to maintain compatibility between different versions of an evolving schema as well as for more complex problems involving semantic heterogeneity, such as support for multiple data views and tool interoperability. Currently, no well-established methodology exists to guide in the construction of mappings between incompatible representations. This paper will describe the Tool Integration Package (TIP) currently under development at the University of Manchester. TIP is built upon a generic procedural interface (GPIC) that provides highly flexible data base access as well as the means for meta-level facilities. The construction of mappings in TIP is based on a set of generic operators that a user can apply to provide structural constraints on the mapping functions. A number of modules will also be described that provide automated support for mapping construction, including name and structure analysis of the schemas being mapped between. The use of TIP ultimately leads to the encoding of a set of application specific mapping functions. Theoretical issues concerning the run-time management of these mapping functions will also be discussed. The paper will be illustrated by a number of practical examples including a tool interoperability application carried out in collaboration with a major CAD vendor.
The high complexity and the inherent heterogeneity of real world problems is still one of the major challenges advanced information processing systems. Due to the necessity to use different problem solving techniques the general interest in hybrid systems is a fast growing research area as the many recently published approaches show. For supporting the integration of intercommunicating hybrids this paper suggests the use of distributed AI (DAI) techniques. Main advantages of this approach are the encapsulation of different paradigms, the separation of control and domain knowledge and the reduction of the complexity of individual problem solvers. After a discussion of the technologies used in our application (knowledge based systems, neural networks), we finish the chapter 2 reviewing related work. Because of the special importance of DAI for our argumentation we examine in chapter 3 issues and research directions in this field and conclude this chapter with the presentation of a view of DAI as an integrative paradigm. In a case study we show how neural networks can be integrated in a more general problem solving framework.In particular architectural aspects are discussed.
While the Internet provides an extensive and rich collection of geospatial resources (i.e., geographic information and services), the ability to find and retrieve relevant information effectively is not currently provided. Specific challenges in this domain include: a large number of disparate collections of data and service-based resources without a cohesive method of describing and classifying them; a need for spatial reasoning and analysis for generation of desired products that may require a specific access protocol to multiple resources and post processing within a Geographic Information System (GIS); a need for acquiring a geospatial product without the need for a local GIS; difficulty in finding the appropriate geospatial resource(s) for a given request; and a need for an application level geospatial retrieval product in place of a standard browser application We are constructing a prototype architecture for the Geospatial Resource Broker that addresses some of these challenges. This architecture includes: a resource directory service capable of capturing meta-information about services, pointers to meta-data where available, and meta-knowledge about resource content; a mediator that captures and maintains resource descriptions; a collection of cooperating agents to include: accessors that know the nuances of resource structures and retrieval mechanisms, and GIS product generators that can import, manipulate, and export spatial data; and facilitators that provide intermediate services to handle a spatial request by orchestrating resource directory access and agent invocation. An initial prototype will demonstrate the ability to provide information about spatial resources and retrieve information through this intelligent broker service. It will also clarify the types of meta-data, information, and knowledge needed to support the geospatial domain.
For achieving the interoperability among heterogenenous computing systems, the Object Management Group (OMG) proposed the use of an Interface Definition Language (IDL) for specifying object properties and operations which encapsulate the data and programs of heterogeneous systems. Although IDL is suitable for achieving program interoperability, its underlying object model is that of C++ and lacks the semantic expressiveness needed for capturing the complex structural properties and constraints found in many application data. For achieving product model and data exchange, the ISO/STEP community has introduced the information modeling language EXPRESS. In comparison with IDL, EXPRESS is richer in semantics. It allows the definition of more complex data types and their constriants. However, EXPRESS, though having an object-oriented flavor, is not an OO language. It does not capture the behavioral properties of data and is still semantically weak in expressing many association types and constraints of complex objects. This paper describes a common language which integrates the features of IDL and EXPRESS as well as the features of the association type and knowledge rule specifications offered by the Object-oriented Semantic Association Model (OSAM*). This common language, named the NIIIP Common Language (NCL), is a part of the R&D efforts of the project called the National Industrial Information Infrastructure Protocols (NIIIP). NCL is to be used for modeling all things of interest in a heterogeneous network system in terms of their 1) structural properties (attributes and association types), 2) behavioral properties (method specifications) and semantic constraints (event-condition-action rules). Its design conforms as much as possible to the two standard languages IDL and EXPRESS and its implementation is based on a language mapping to an extensible programming language K.3 whose processing is supported by an extensible object-oriented knowlege base management system OSAM*.KBMS. In NCL, frequently used simple constriants are specified by keywords, association types among object classes are specified in association specifications, behavioral properties of objects are defined in method specifications, and complex semantic constraints of various types are specified by ECA rules. Keyword and association type specifications are translated into ECA rules for processing by the rule processor of the KBMS. Additional semantic properties found in a heterogeneous environment can be easily introduced into NCL due to the extensibility feature of K.3 and OSAM*.KBMS. In this paper, we shall show how such an enriched object model and its language can be used to extend the OMG's ORB functionalities for information sharing and program interoperability in a heterogeneous environment.
In our report, we will discuss the problem solving architecture of
Neper* Wheat, an expert system developed for the management of
irrigated wheat in Egypt. Neper Wheat combines the products of expert
systems research and crop modeling into a problem solving architecture
that addresses the various aspects of wheat crop management. This
includes varietal selection, planting/harvest date selection, sowing
parameters decisions, insect/disease/weed identification and
remediation, irrigation/fertilization management and harvest
management. Specifically, from the field of expert systems research,
we adopt the Generic Task Approach to expert systems development
pioneered by Chandrasekaran et al. (1986), as well as the Knowledge
Level Architecture ideas proposed by Sticklen (1989). From the domain
of wheat crop modeling, we utilize CERES Wheat (Ritchie et al. 1985)
as a dynamic knowledge-base to provide predictions of the crop's
behavior. The Generic Task (GT) Approach allows developers of
knowledge-based systems to tackle complex problems through a method of
task decomposition and mapping of appropriate problem solving tools to
each identified subtask. The Knowledge Level Architecture (KLA)
provides a means of bringing together the problem solving tools into
one integrated architecture. The KLA proposes that a problem solver
assigned to a task be viewed as a cooperating agent. Communication
channels are then established between any two agents working together.
These communication channels define requests for service and
appropriate responses between the two cooperating agents. Normally,
the cooperating agents come from the Generic Task tool set of
identified task types. However, the problem of wheat crop management
requires precise predictions of the crop's behavior given the farmer's
circumstances and crop management decisions imposed by the developing
plan. These quantitative predictions are best performed by proven crop
simulation technology. Therefore, our system employes CERES Wheat, a
well established wheat crop model, to perform this predictive task.
* The term Neper comes from an early Egyptian god of agriculture.
Chandrasekaran, B. (1986). Generic Tasks in Knowledge-Based Reasoning:
High-Level Building Blocks for Expert System Design. IEEE Expert,
1(3), 23-30.
Ritchie, J. T., Godwin, D. C., & Otter-Nacke, S. (Ed.). (1985). CERES
Wheat. A Simulation Model of Wheat Growth and Development. College
Station, Texas: Texas A&M University Press.
Sticklen, J. (1989). Problem Solving Architectures at the Knowledge
Level. Journal of Experimental and Theoretical Artificial
Intelligence, 1, 1-52.
Several methodologies for the semantic integration of databases have been
proposed in the literature. These often use a variant of the
Entity-Attribute-Relationship model as the common data model. To aid the
schema conforming and merging phases of the semantic integration process
various transformations have been defined which map between EAR
representations which are in some sense equivalent.
Our work aims to formalise previous approaches by
- adopting a semantically minimal common data model,
- formally defining the notion of a valid transformation of one schema
into another, and
- identifying a minimal set of such transformations.
The common data model we use is a binary relational one comprising
entity types and binary relationships between them, including
inclusion relationships. By minimal we mean that any valid
transformation can be defined as a sequence of transformations from
the minimal set.
We differentiate between transformations which are general to all
extensions of the schema and those which require knowledge-based
reasoning since they apply only for certain extensions. This
distinction serves to enhance the performance of transformation tools
since it identifies which transformations must be verified by
inspection of the schema extension. It also serves to identify when
intelligent reasoning is required during the schema integration
process.
A large-scale interoperable database system operating in a dynamic environment should provide uniform access user interface to its components, scalability to larger networks, evolution of database schema and applications, flexible composibility of client and server components, and preserve component autonomy. To address the research issues presented by such systems, we introduce the Distributed Interoperable Object Model (DIOM). DIOM's main features include the explicit representation of and access to semantics in data sources through the DIOM base interfaces, the use of interface abstraction mechanisms (such as specialization, generalization, aggregation and import) to support incremental design and construction of compound interoperation interfaces, the deferment of conflict resolution to the query submission time instead of at the time of schema integration, and a clean interface between distributed interoperable objects that supports the independent evolution and management of such objects. To make DIOM concrete, we outline the Diorama architecture, which includes important auxiliary services such as domain-specific library functions, object linking databases, and query decomposition and packaging strategies. Several practical examples and application scenarios illustrate the usefulness of DIOM.
In the last decade, several knowledge representation formalisms and reasoning techniques based on classes and relations have been investigated. This paper deals with the idea of exploiting such techniques in the integration of database schemas.
Traditional approaches to database integration require that a common key exist in all participating relations that model equivalent entities in the real-world, therefore, compromising the logical heterogeneity of multidatabases. Recently, a few researchers propose to use knowledge to identify equivalent entities without requiring a common key. This arises the issue of detecting potential inconsistency among data and knowledge in entity identification. We present three criteria for the consistency in this context and consider incremental test in the process of updating data and knowledge. High efficiency is obtained for update of data, and reasonable efficiency is obtained for update of knowledge. To see how practically useful the proposed framework and algorithms are, we conduct an experiment on a case study of three databases in the real lif e. In this work, all local schemas are assumed to be translated into the relational model, but they are not required to share a common key.
The integration and management of multiple data and knowledge systems entail not only interoperability, local autonomy, and concurrent processing, but also the new capabilities of adaptiveness allowing scalable (or incremental) integration and non-standard or legacy systems, while avoiding the well-known performance problems caused by the traditional approaches to global control and management. The past decade has seen many good solutions developed fron vast worldwide efforts for many aspects of the problem; yet certain needs remain unsatisfied --- especially the new capabilities of adaptiveness. Most of the previous efforts have focused either on providing an architecture for direct interoperation among different systems (such as CORBA, Common Object Request Broker Architecture), or on developing a global model to manage these systems in the tradition of databases (such as heterogeneous and distributed DBMS's). It seems that a promising approach to the problem is making the interoperable architecture adaptive and scalable by virtue of defining it throught the global model; or, simply, the model-based architecture. Such an architecture is proposed in this paper, using the metadatabase model. The concept of metadata independence for multiple systems is presented as the basis for the conceptual design of the architecture, while the Rule-Oriented Programming Environment (ROPE) method is developed to execute the design. ROPE implements global (integration) knowledge into localized rule-oriented shells which constitute the control backbone of the interoperating architecture, and manages these shells. The metadatabase enables the shells to grow in number as well as to change its control knowledge contents. A prototype is developed to test the concepts and design.
The next decade of research in high performance computing and communications promises to deliver widely available access to unprecedented amounts of constantly expanding data. It is clear that many defense and commercial applications will benefit from learning new knowledge by integrating and analyzing very large amounts of widely distributed data to uncover and report upon subtle relationships and patterns of events that are not immediately discernible by direct human inspection. Although much progress has been made in developing new and useful machine learning algorithms that learn from examples, the computational complexity of many of these algorithms makes their use infeasible when applied to large amounts of inherently and physically distributed data. In order to provide the promised ability to learn new knowledge from large amounts of information, a central problem that we call the scaling problem for machine learning, needs considerable attention. In this paper, we describe a general approach that we have come to call meta-learning. Meta-Learning refers to a general strategy that seeks to learn how to combine a number of separate learning processes in an intelligent fashion. We desire a meta-learning architecture that exhibits two key behaviors. First, the meta-learning strategy must produce an accurate final classification system. This means that a meta-learning architecture must produce a final outcome that is at least as accurate as a conventional learning algorithm applied to all available data. Second, it must be fast, relative to an individual sequential learning algorithm when applied to massive databases of examples, and operate in a reasonable amount of time. To achieve scalable learning systems by meta-learning that are efficient in both space and time, we study solutions based upon parallel and distributed computing. Experimental results achieved on a large number of alternative meta-learning strategies are reported.
Analysis of scientific data gathered by remote--sensing instruments should lead to the evaluation and validation of scientific hypotheses concerning natural and/or mankind caused phenomena. Hypotheses can be further exist in contest with other hypotheses (Open World Assumption) and will be based on a sufficient ground of knowledge. This can be provided by different resources, i.e., a database for measurements and observations data, a knowledge base in the form of metadata, a spatial--temporal data model for visualization purposes. New data can strengthen or lessen a scientific hypothesis. Mediators will be used for extracting and justifying scientific hypotheses by (dis)--connecting data elements from the relevant data resources under consideration. The development of the scientific information system is based upon a federated architecture.
Firstly, we describe the problem space of heterogeneity across text-based, data-based, and knowledge-based information systems, including the role of object-oriented systems within that space. This description serves as the basis for defining interoperability requirements. Secondly, we describe a collection of tools developed at Cardiff, which provide services to overcome syntactic and semantic heterogeneity in network-, relational- , and object-oriented databases and which achieve interoperability between several paradigms. The achievements sofar are set against the background of requirements described before. Thirdly, we describe an architecture which integrates these services into a standard software toolkit. Fourthly, we outline a client-server architecture to show how the services could be used in an agent- or mediator- based environment where autonomous problem-solving capabilities may be required from the toolkit.
The Context Interchange strategy has been proposed as an approach for
achieving interoperability among heterogeneous and autonomous data sources
and receivers (Siegel and Madnick, 1991). In a recent paper (Goh, Madnick
and Siegel, 1994), we have argued that the Context Interchange strategy has
many advantages over traditional loose- and tight-coupling approaches
proposed in the integration literature. Our goal in this paper is to present
an underlying theory describing how those features can be realized. For this
purpose we define a data model, called COIN (COntext INterchange), which
describes (1) how domain and context specific knowledge can be represented
and organized for maximal sharing; and (2) how these bodies of knowledge can
be used to facilitate the detection and resolution of semantic conflicts
which may arise when data is exchanged between different systems. Within
this framework, ontologies exists as conceptualizations of particular domains
and contexts as "elaborations" (or constraints) on existing descriptions of
objects. We show that when suitably constrained, these descriptions have an
elegant logical interpretation which allows knowledge originating from
ontologies, contexts and user queries to be uniformly used for the detection
of semantic conflicts. We conclude this paper by reporting on our experiences
with an implementation which provides integrated access to multiple financial
data services using the context interchange approach (Daruwala et al, 1995)
and describe how this has provided valuable insights on viable strategies for
transitioning this technology to production use.
References:
Adil Daruwala, Cheng Hian Goh, Scott Hofmeister, Karim Hussein,
Stuart Madnick and Michael Siegel. The Context Interchange Network.
To be presented at IFIP WG2.6 Sixth Working Conference on Database
Semantics (DS-6), Atlanta, Georgia, May 30 to June 2, 1995.
Cheng Hian Goh, Stuart Madnick, and Michael Siegel. Context
Interchange: Overcoming the Challenges of Large-Scale Interoperable
Database Systems in a Dynamic Environment. In Proceedings of the
Third Int'l Conf on Information and Knowledge Management, pages
337--346. Gaithersburg, MD, Nov 1994.
Michael Siegel and Stuart Madnick. A Metadata Approach to Resolving
Semantic Conflicts. Proceedings of the 17th International Conference
on Very Large Data Bases. 1991.
The virtual enterprises of the future must address infrastructure
issues such as authentication, authorization, security, recovery, etc.
when providing information to users outside the private domains of its
individual members. The National Industrial Information
Infrastructure Protocols consortium is developing a Reference
Architecture and a Reference Implementation of such an environment.
Central to the this project is the role of Organizational and Resource
intelligent agents that act as high level decision making elements and
the role of Mediators and Negotiators who support the decision making
by establishing local "infobases" that resolve the semantic mismatches
that occur when enterprises each with differenct tools, terminologies,
procedures and criteria, attempt to organize their process under a new
shared workflow.
The NIIIP protocols integrate the standards activities of six disparate
communities: the Internet Society, the Object Management Group,
ISO STEP, Workflow Management Coalition, and I3, into a common set
of protocols, packaged across 13 components to facilitate binding to
COTS (commercial off the shelf) products. This paper will describe
the overall infrastructure, as well as the protocols used in each
component. The intended operation of the NIIIP infrastructure will
be described using a consortium demonstration scenario."
An important service in an intelligent information infrastructure is the
maintenance of the integrity of the stored information. Historically,
database management functions, such as transaction management and
data integrity constraints, were developed in part to relieve application
programmers of developing software that manages the consistency of the
application data. As applications that update information across multiple
resources are created, consistency of the information again becomes an
important management function which should be provided as a coordination
service to applications.
Recently, many extended transaction management models have been proposed to
maintain consistency in multidatabase environments. Similarly, interdatabase
constraint managers are being developed, some of which generate multidatabase
transactions or other triggering mechanisms to implement constraint
enforcement. The objectives of these works are partly related, yet a
consistent framework for how these coordination services should interoperate
is not defined.
In this paper, we analyze the fundamental goals of constraint and
transaction management in a multiple information resource environment and
how the work of consistency management should be divided between these
coordination services. From this analysis, we define a set of constraint
rules for the design of a new system architecture, [V], for the integration
of next-generation information resources.
It is intended to set forth the principles of uniform mathematical
(a) describing structured meanings of arbitrarily complicated real
discourses pertaining to science, technology, medicine, law,
business, etc., (b) representing knowledge about the world, (c)
building semantic representations of complicated visual images.
These principles are provided by the theory of K-calculuses and
standard K-languages being the central constituent of Integral Formal
Semantics (IFS) - a new, highly powerful approach to the
formalization of semantics and pragmatics of Natural language (NL)
developed by V.A. Fomichov and represented, in particular, in
several large publications in English.
The results which are to be stated in the paper open highly large and
unique prospects for designing full-text databases, visual
information management systems, and hybrid knowledge representation
languages.
A formal heterogeneous software [28] design begins with a formal
specification (see[2,8,26] for example) for the programs and stores
them at a knowledge-base[25]. A formal specification is based on a
formal language and makes use of some defining axioms and possibly
mathematical structures, in characterizing modules or programs and
defining software agents [32]. A formal specification in our approach
consists of a signature specifying the type names, and the operations
on types, including the rank and the arity of the operations. It also
includes a set of axioms which recursively define the operations.
Such specifications capture the intended meaning of the possible
computation sequences applying a given set of objects and operations,
thus specifying a program[1]. Rather than tuning the specifications
for efficiency, we argue that efficiency can be achieved by
transformation. Note that transformations are one method of program
tuning, but tuning the specifications are not achieved through program
transformation techniques.
The specifications are tuned by discovering new rules or defining
axioms which can subsume other such axioms, or introducing rules which
can be more efficiently realized given a target implementation
language. Completing an incomplete set of defining rules through
implementation constraints or gradual domain knowledge learning is
another important aspect of specification tuning. Meta-programming is
a technique for syntax tree mapping from one language to another. We
had put forth the paradigm of mapping the specifications to a
high-level language for which an optimizing compiler already
exists[15,21,6,8].
The programs that literally implement the mappings are defined by
meta-programs [18,23], but with a prudent choice of the syntax tree
maps during meta-programming, such that correct and efficient
implementation maps are attained. A meta-program is a program that
manipulates programs, in the sense that the object on which the meta-
program functions act on are program constructs of a source
language.Meta-programs work at the level of abstract syntax trees and
allow us to transform the syntax trees in the specification language,
to the syntax trees for a target high level language code, while
allowing us to manually code-in the implementation map[1] during the
translation. Implementations are homomorphic maps of the syntax trees
in one language to another, such that the semantics (the intended
meaning) of the specifications are preserved.
The trees defined by the present approach have function names
corresponding to computing agents. The computing agent functions have
a specified module defining their functionality. These definitions are
applied at syntax tree implementation time. The homomorphism is
defined by setting the correspondence between syntax trees in the
specification language and the target language, which consists of
executable code syntax trees, through the user defined meta-programs.
The mapping preserves the operation on trees, in the image algebra of
trees for the target language.
By the image algebra of trees we intend the image of the abstract
syntax tree algebra for a source language grammar into the algebra of
abstract syntax trees for the target grammar. Such an approach allows
us to readily modify specifications and simply run through the mapping
process to produce alternate executable code for slightly different
machines or environments.
An abstract implementation is essentially a homomorphism of one
abstract algebra into another abstract algebra, mapping the algebra of
syntax trees and the associated semantic equations for a source
language (the specification language) to that of the algebra of syntax
trees and their semantic equations, for a target concrete language (a
programming language, such as LISP).
To give at least one example use of abstract data types in automatic
programming we refer to [27]. The correctness of implementation
problem is that of logically ensuring that the homomorphism is
realized correctly, satisfying some required properties. The
implementation homomorphism is automatically defined through the
inductive properties of syntax trees, through the process of algebraic
extension. By defining a meaning function for the trees defined by the
type constructor functions, one automatically derives a homomorphism
by algebraic extension, on the entire set of trees for the particular
syntax.
For example, if h is a homomorphism on syntax trees, with h(+(t1,t2))
= +(h(t1),h(t2)), then it is sufficient to know the value of h([t1])
and h([t2]), where [t] denotes a canonical term (built entirely from
type constructor functions and constants), equivalent to t. This is
because by definition any term t has to be congruent(algebraically
equivalent), through the equations defining the operations, to a
canonical one, see ([1] and [5]).
As an example, in the process of meta-programming the algebraic
specification language Compose to Prolog in [21,22], the homomorphism
was implicitly realized through the process of reverse Skolemization,
in which a Prolog predicate is defined for each type constructor
function for a given algebraic program specification. The proof of
correctness of implementation can be carried-out easily by checking
that the homomorphism is realized correctly[1]. It is a process that
can make use of automatic verification and the mathematical
techniques, such as the ones proposed in [24] for tree mapping
correctness, and by this author in [1].
We present new techniques for design by software agents and new concepts of Abstract Intelligent Implementation of AI systems (AII). The stages of conceptualization, design and implementation are defined by AI agents and Mediators. Multiagent implementations are proposed to facilitate a software design methodology, which incorporates object level nondeterministic knowledge learning and knowledge representation methods developed in [12]. Objects, message passing actions, and implementing agents are defined by syntactic constructs, with agents appearing as functions, expressed by an abstract specification language, capable of specifying modules, agents, and their communications, [11] for example. By defining Agent Provocateurs events and activity is computed for the AII agents. The proposed abstract intelligent implementation techniques provide a basis for an approach to automatic implementation by Intelligent Free Trees from knowledge presented by an algebraic parameterized language. The object level definition for individual modules are turned to executable programs by source abstract syntax tree to target abstract syntax tree morphisms. AII techniques are applied to define an Ontology Preservation Principle. An overview for validation and verification of AI systems is presented. as a direct application of the above AII techniques.
Two different aspects of data management are addressed by description logics (DL) and databases (DB): the semantic organization of data and powerful reasoning services (by DL), and their efficient management and access (by DB). This paper shows how assertional knowledge of DLMS and data of DBMS can be uniformly accessed. Our extended paradigm integrates the separately existing retrieving functions of DL and DB in order to allow, via a query language grounded on a DL-based schema knowledge, the uniform formulation and answering of queries, for retrieving data from mixed knowledge/data bases. In this way advantages of DL and DB are put together. Thus, this technique can be used in DLMS for the efficient management of large amounts of data, and in DBMS for the semantic organization of unstructured, and eventually heterogeneous and distributed, databases.
Significant roadblocks to intelligent information and services integration include: (1) misinterpretation of data across contexts, applications, and users; (2) lack of seamless function integration among distributed applications; and (3) lack of support for change propagation across enterprise contexts. The Integrated Development System Environment (IDSE) is a distributed computing environment that provides automated support for both information and function integration among distributed applications. The foundation for both types of integration is the use of ontologies. Information integration is achieved through the interpretation of data and the dynamic propagation and enforcement of constraints based on ontological descriptions. Function integration is achieved through an Integration Service Manager (ISM) that can process service requests >From applications based on ontological knowledge of the tools integrated into the environment. This paper describes the implementation of the IDSE and explains how the environment supports the intelligent integration of both information and services. Successes as well as problems and limitations encountered during the implementation of the IDSE are also discussed.
The first step in interoperating among heterogeneous databases is
semantic integration: Producing metadata that describes relationships
between attributes or classes in different database schemas. The
process of semantic integration can not be "pre-programmed" since the
information to be accessed is heterogeneous. Intelligent information
integration involves automatically extracting semantics, expressing
them as metadata and matching semantically equivalent data elements.
Semint (SEMantic INTegrator) is a system prototype of a mediator to
assist in semantic integration developed at Northwestern University.
Semint supports access to a variety of database systems and utilizes
both schema information and data contents to produce matching rules
between database schemas. In Semint, the knowledge of how to match
equivalent data elements is "discovered", not "pre-programmed".
In this paper we will provide theoretical background and
implementation details of Semint. Experimental results from running
Semint on large and complex databases will be presented. We discuss
the effectiveness of different types of metadata (discriminators) in
determining attribute similarity. We also introduce a framework for a
dynamic semantics-based query language for multidatabase systems and
discuss other applications (such as digital libraries) that could use
Semint as part of a complete semantic integration service.
The paper will contain technical approaches to an intelligent integration of information for the purpose of environmental protection including
This paper describes an intelligent assistant that is designed to help information specialists select and combine data sources to produce new information services. Rather than attempting to fully plan out and answer queries autonomously, the assistant provides interactive knowledge-based support for a specialist designing a query plan. The assistant helps its user deal with the heterogeneity of data sources at two levels. First, it deals with schema heterogeneity by describing sources' content in the vocabulary of a central conceptual ontology. Users refer to concepts in this ontology, and the assistant finds the source(s) needed to retrieve or construct those concepts. Second, it deals with instance-level heterogeneity -- where data items in sources refer to the same concept instance differently -- by providing an extended set of relational operators to be used in query plans. In addition to standard relational operators (equi-join, select, etc.), the assistant provides heuristic operators such as heuristic join [Huffman&Steier95], that use heuristic matching to integrate sources with instance-level heterogeneity. Finally, the assistant makes use of meta-information about sources such as access cost and accuracy in building query plans.
Integrating multiple sources of information into one unified database requires more than structurally integrating diverse database access methods and data models. In applications where the data is corrupted (incorrect or ambiguous), the problem of integrating multiple databases is particularly challenging. We call this the "merge/purge" problem. The key to successful solving merge/purge depends on "semantic integration" which requires a means of identifying "similar" data from diverse sources. We use a rule program that declaratively determines when two pieces of information are similar and represent some aspect of the same domain entity. However, since the number and size of the data sets involved may be large, the number of records to be compared at a time by the rule program must be limited to a small number of "good" candidates. Also, large-scale parallel and distributed computing systems may be the only hope for achieving good functional performance in a reasonable amount of time with acceptable cost. In this paper we describe the "sorted neighborhood" method to solve merge/purge and provide experimental results that suggest this method will work well in practice (reasonable execution times and accuracy of the results). As expected, a tradeoff between accuracy and execution time exists. We explore means of improving the accuracy of the results without severely affecting the execution time of our algorithms.
Research in database integration has been directed for the most part at resolving schema-level incompatibility issues. However, integrating all records which represent the same real-world entity is an important task in performing database integration, which has not been addressed much in the literature. A common identification mechanism for similar records across heterogeneous databases is usually not available. Entity identification, i.e., the task of integrating records from different databases that represent the same entity, in such cases can be performed by examining the relationships between various attribute values among the records. This process makes use of additional knowledge of data semantics available to the users familiar with the data. We propose the use of distances between attribute values as a measure of closeness between the records they represent. Record matching conditions for entity identification can then be expressed as a suitable combination of these pairwise attribute distances. Using a distance-based approach for matching records allows the design of more efficient and effective methods for entity identification. Due to various data sources having been developed independently, and often by different groups of individuals, there is no easy way to obtain the record matching conditions. Our approach uses knowledge discovery techniques to automatically derive these conditions (expressed as decision trees) from the data. In this paper we describe the distance-based framework for performing the instance-level integration of databases. The results we obtained from performing entity identification on real-world databases from the telecommunication industry are presented. The results from our experiments demonstrate that our approach is highly effective for performing entity identification.
We present a query mediation approach to the interoperation of autonomous heterogeneous databases containing data with semantic and representational mismatches. We develop a mediation architecture of interoperation that facilitates query mediation, and formalize the semantics of query mediation. The main contributions are the automated mediation of queries between databases, and the separation of semantic heterogeneity from representational heterogeneity. Query mediation in heterogeneous legacy databases makes both the data and the applications accessing the data interoperable. Automated query mediation relieves users from the difficult task of resolving semantic and representational mismatches. Decoupling semantic and representational heterogeneity improves the efficiency of automated query mediation. Our approach provides a seamless migration path for legacy databases, enabling organizations to leverage off investments in legacy data and legacy applications.
Research in database integration has been directed for the most part at resolving schema-level incompatibility issues. However, integrating all records which represent the same real-world entity is an important task in performing database integration, which has not been addressed much in the literature. A common identification mechanism for similar records across heterogeneous databases is usually not available. Entity identification, i.e., the task of integrating records from different databases that represent the same entity, in such cases can be performed by examining the relationships between various attribute values among the records. This process makes use of additional knowledge of data semantics available to the users familiar with the data. We propose the use of distances between attribute values as a measure of closeness between the records they represent. Record matching conditions for entity identification can then be expressed as a suitable combination of these pairwise attribute distances. Using a distance-based approach for matching records allows the design of more efficient and effective methods for entity identification. Due to various data sources having been developed independently, and often by different groups of individuals, there is no easy way to obtain the record matching conditions. Our approach uses knowledge discovery techniques to automatically derive these conditions (expressed as decision trees) from the data. In this paper we describe the distance-based framework for performing the instance-level integration of databases. The results we obtained from performing entity identification on real-world databases from the telecommunication industry are presented. The results from our experiments demonstrate that our approach is highly effective for performing entity identification.
Many automated information systems need to: (1) transform and
cache information from dynamic, shared databases, (2) reason about the
current state of those data, and (3) perform long-running tasks but
cannot lock the objects about which they are reasoning, so as to allow
concurrent access by other applications. Many of these applications
can tolerate some deviation between the state of their caches and that
of the shared databases, as long as this deviation is within specified
tolerances. This paper describes an active agent approach to cache
management for such applications.
In previous work [SelKer93a, SelKer93b, SelKer95], we described
the data consistency requirements of applications with the
characteristics described above and proposed an architecture for
addressing those requirements. The approach has some unique features:
(1) it permits applications to specify their data consistency
requirements using a declarative language, (2) it automatically
generates the rules and other database objects necessary to enforce
those consistency requirements, shielding the application developer
>From the implementation details of consistency maintenance, and (3)
it provides an explicit representation of consistency constraints in
the database, which allows them to be reasoned about and changed
dynamically to adapt to evolving situations.
Since the publication of [SelKer93a, SelKer93b, SelKer95],
the proposed approach has been refined, implemented, and used to
construct an application. This paper makes a number of new
contributions. First, it introduces and formalizes quasi-view
objects, which extend quasi-caching to support the transformation of
objects before they are cached. Second, it presents a declarative
language for specifying quasi-views, based on a modest extension to
SQL. Third, it presents techniques for automatically generating (from
a declarative quasi-view specification) active database rules to
maintain the quasi-view. The approach represents "staleness"
conditions explicitly, so that they can be queried and manipulated by
users and applications. Fourth, it describes an implementation of the
approach in prototype software. Finally, the paper presents a cost
model that demonstrates that the approach has the potential to scale
up to large databases having many quasi-views.
[SelKer93a] Seligman, L. Kerschberg, "An Active Database
Approach to Consistency Management in Data- and Knowledge-based
Systems," International Journal of Intelligent and Cooperative
Information Systems (IJICIS), 2(2), 1993.
[SelKer93b] Seligman, L. and L. Kerschberg, "Knowledge-base/Database
Consistency in a Federated Multidatabase Environment," IEEE Research
Issues in Data Engineering: Interoperability in Multidatabase Systems,
Vienna, Austria, IEEE Computer Society Press, 1993.
[SelKer95] Seligman, L. and L. Kerschberg, "Federated Knowledge and
Database Systems: A New Architecture for Integrating of AI and
Database Systems," Advances in Databases and Artificial Intelligence,
Vol. 1: The Landscape of Intelligence in Database and Information
Systems. L. Delcambre and F. Petry, JAI Press, 1995.
Information integration is enabled by having a precisely defined
common terminology. We call this combination of terminology and
defintions an ontology. We have developed a set of tools and services
to support the process.
of achieving consensus on such a common ontology by distributed
groups. These tools makes use of the world-wide web to enable wide
access and provide users the ability to publish, browse, create, and
edit ontologies stored on an ontology server. Users can quickly
assemble a new ontology from a library of modules. We discuss how our
system was constructed, how it exploits exising protocols and browsing
tools, and our experience supporting hundreds of users. We describe
applications using our tools for achieving consensus and integrating
information.
The Internet provides dramatic new opportunities for gathering
information from multiple, distributed, heterogeneous information
sources. However, this distributed environment poses difficult
technical problems for the information-seeking client, including
finding the information sources relevant to an interest, formulating
questions in the forms that the sources understand, interpreting the
retrieved information, and assembling the information retrieved from
several servers into a coherent answer.
We describe and demonstrate enabling technology for addressing these
problems, particularly as they occur in Electronic Commerce applications.
We focus on techniques needed to enable a marketplace of network-based
information brokers that retrieve information about services and products
whose descriptions are available via the Internet from multiple vendor
catalogs and data bases. The services provided by such brokers include:
- Facilitating a human or computer client in the task of formulating a
query in a domain-specific vocabulary provided by the broker.
- Identifying information sources that are relevant to answering a
query.
- Translating a query into the ontology and syntax required by a given
information source, obtaining responses to the query, and
translating the responses into the broker's ontology and syntax.
- Aggregating, presenting, and explaining the responses to a query.
Our goal is to enable vendor and buyer communities to build
their own information brokers. To do this, we will solve a set of
technical problems involved in brokering, embody the solutions in an
information brokering architecture, build tools that facilitate the
construction of brokers using that architecture, and build example
brokers using the architecture and tools. It is essential to reduce
the cost of integrating information sources and to provide a path that
allows for incremental integration that can be responsive to client
demands. We present an approach to integrating disparate
heterogeneous information sources that uses context logic. Our use of
context logic reduces the up-front cost of integration, provides an
incremental integration path, and allows semantic conflicts within a
single information sources or between information sources to be
expressed and resolved.
Determining the correspondences between different database schema specifications represents the most difficult and time-consuming activity that is to be performed during the construction of an interoperable database schema. Such correspondences may be the source for conflicts when integrating the schemas and thus must be detected and resolved. A manual inspection of the class definitions in each database and a comparison with each class definition in the other participating databases may result in an almost endless process. Moreover, this process of schema comparison is a purely manual activity so far. To support a federation manager during this activity we propose a computerized tool to extract the semantics from schema definitions, to transform them into a unique vector representation of each class, and to use the class vectors to train an artificial neural network in order to determine categories of classes. The output of the tool is a `first guess' which concepts in schemas may be overlapping or which concepts do not overlap at all. This may be of tremendous value because the designers are relieved from the burden of manual inspection of all the classes and can direct their focus on classes grouped by the tool into the same category.
Interoperability between databases provides a uniform
way to access data stored over different sites. An
object-oriented data model is generally used to reduce
heterogeneity amongst the database components (e.g., [18,19,20]),
in which a global schema is defined from an integration of local
schemas. A user expresses his/her requirements using the local
query language(s), and these are then translated to the global
schema. In this paper we address these transformations. We provide
a framework which transforms a relational schema into an
object-oriented schema identifying implicit knowledge contained
within relations, keys and referential integrity constraints.
To do so, a relational schema is classified into three
subsets of relations to reflect the different concepts of an
object-oriented schema:
. Base relations: they are relations which are independent
of other relations of the database. They are translated
directly into re-usable classes.
. Dependent relations: they express relationships
between two bases relations. They generally simulate either
binary relationships, such as aggregations, or simple
inheritance relationships.
. Composite relations: these relations are the
generalization of dependent relations and express relationships
among different relations. A composite relation simulates either
an association class or multiple inheritance between existing
classes.
Based on the above classification, appropriate translation rules and
algorithms are provided to generate a sub-schema of the global schema.
Since queries on a global schema are expressed using local query
language(s), relational algebraic expressions are also transformed
into an object-oriented model. To do so, every algebraic expression
is decomposed according to the types of relations involved in the
expression (such us base, dependent and composite relations). The
result of the decomposition is a tree, called algebraic tree, in which
- a node represents a subquery which relates to a single object-oriented
class (i.e., class local query), and
- an edge of the tree models the dependencies between the subqueries.
Every algebraic tree is implemented as a set of procedural methods.
Indeed, a node is implemented as a method using predefined methods of
the corresponding class (e.g., get() and set() methods). An edge between
two nodes is implemented as a calling relationship between the methods
related to the nodes of the algebraic tree. Appropriate algorithms which
generate methods for relational algebraic expressions are also provided.
OVERVIEW OF THE APPROACH
Cooperation between autonomous databases has been an area of great
interest in the last a few years. This situation is called
interoperability and the system which manages this interoperability
is called a federated database system [18,8]. To allow
interoperability, a "rich" data model is used as a canonical model
where all local information can be expressed using the concepts of
this model. Object-oriented models are generally considered as
"good" canonical models because they provide richer abstractions
than that of the existing models (e.g., the relational model)
[18,19,20]. Each local database system (LDS) supports export/import
mechanisms to the canonical model, and LDSs use transformations that
translate schema (or sub-schema), data or queries to/from the canonical
model. This paper addresses these transformations and provides a
framework which allows the generation of o-o schema and algebraic
operations from a relational database.
The translation between data models has been addressed by several
researchers. Initially Zaniolo [24] developed a tool that automatically
generates relational schemas from CODASYL schemas. The approaches
provided in [13,5] are concerned with transformations between
extensions of the ER (Entity Relationship) and the relational data
models. Lien [11] described mappings from the hierarchical to the
relational data model. Tschritzis and Lochovsky [17] provided a
summary of different types of mappings. With the advanced research
in federated databases, the translations have became a key issue
because of the necessity to access heterogeneous information. Some
translation research have been proposed in the context of federated
databases (e.g., [4,7,12]). Castellanos and Saltor, in [4], have
used enrichment techniques to identify o-o constructs from a
relational schema. Ling [12] has proposed a similar approach as
[4] however with some extensions to aggregation relationships.
The above existing translation approaches are useful, however their
usability is limited in the context of o-o databases. Most of these
approaches do not provide a translation framework that is consistent
with the object paradigm. The following is a summary of the problems
with the existing translation approaches. Most of the approaches
. do not fully take into account the complexity of o-o models.
Generally they provide a "partial" mapping of relational
schemas in which few concepts of o-o schemas are covered
(e.g., class identification). Also, (Simple and multiple)
Inheritance and other forms of aggregations,
such us associations, are not considered.
. still use the relational "philosophy" for building an o-o schema.
However, in general, the o-o philosophy is founded on an iterative
and incremental design approach (e.g., Booch [3], Coad & Yourdon
[23] and so on) in which the final o-o design is a refinement of
the initial design.
. address only some aspects of relational database applications.
Other aspects, such as algebraic expressions, are not
translated.
In this paper we propose a translation methodology that generates an
o-o database and overcomes the problems described above. It is worth
mentioning that the proposed methodology results in revealing implicit
semantics within relational specifications which are subsequently made
explicit - as much as possible - within the target o-o database. The
proposed work is an extension of the results for method translation
[19,21,14], in which a general translation framework will
be proposed and will include both schema and algebra translation.
. The mapping process we propose is consistent with the o-o
"philosophy" (e.g., Booch [3], Coad & Yourdon [23] in the sense
that the building of an o-o application is an incremental and
iterative process in which the final design is obtained by
successive refinements. We firstly identify those relations that
will serve as kernel classes for the building of the whole o-o
schema. Such relations are called base relations and
will translated to o-o classes. In the second step, those relations
that simulate binary relationships (i.e., references) between
classes are identified. Nested binary relationships
are also identified at this step. These nested binary relationships
simulate nested aggregations between o-o classes and at the same
time "behave" as ternary relationships in the relational
application. Relations which represent simple or nested binary
relationships are called dependent relations. These relations
are either translated as references or inheritance relationships
between classes.
The final step of the translation is concerned with the relations
that simulate either multiple inheritance or association between
classes. These relations are called composite relations. They
are the relations which remain in a schema after all the base and
dependent relations are translated. Composite relations are
translated into either associations or multiple inheritance.
. After an o-o schema is generated from a relational schema by
following the steps described above, the algebraic queries are
then translated. The paper focuses on the following three relational
operations: selection, projection , and join. The mapping of these
operators is closely dependent on the type of the relations
specified within the algebraic expressions. For instance, if an
algebraic expression uses a composite relation, then the appropriate
sub-queries are generated on the classes that simulate the composite
relation. Every decomposition of an algebraic operation
produces a tree, called an algebraic tree, in which every
. a node of the tree relates to a sub-operation on an o-o class.
. an edge represents a calling relationship between sub-operations.
Every algebraic tree is translated as a procedural method. To do
this, the nodes are translated into local methods by using predefined
methods, such as the get() and set() operations of classes. The edges
are implemented as calling relationships between two local methods
embedded in the nodes of algebraic trees.
The introduced algebraic trees facilitate the implementation
of relational operations in object-oriented databases. In fact if
a complex algebraic operation is defined on a set of relations,
algebraic trees are derived for every selection, projection, and join
operation. The nodes of these trees are typed by the classes to
which they relate. The nodes of the algebraic trees which relate to a
common type are merged to form more complex nodes.
ORGANISATION OF THE PAPER
In this paper we develop the above translation framework. The next section
introduces the preliminary definitions and notation. Sections 3 provides
concepts and rules to generate an object-oriented schema from a relational
schema. Translation rules for relational algebraic expressions are described
in section 4. In section 5 we propose algorithms that generate object-oriented
specification of a relational database. Finally, section 6 concludes on the
extensions of the provided approach.
REFERENCES
[1] A. Anderson, W. Caelli, D. Longley, V. Murthy,
M. Papazoglou and Z. Tari: Risk Analysis Project. Queensland
University of Technology, Information Security Research Center, 1994.
Project Leader: Dr. Zahir Tari. Six volumes have been produced: The
Security Architecture (Vol 1), Risiko Daten Speicher (RDS) (Vol. 2),
Platform Domain (Vol. 3) Information Assets and Process Domain (Vol.
4), Mappings (Vol. 5) and QUT-NT Prototype for Risk Analysis (Vol. 6).
[2] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich,
D. Maier, and S. Zdonik: The Object-Oriented Database System Manifesto.
Proc. of 1st Deductive Object-Oriented Database Conf., Kyoto, 1989.
[3] G. Booch: Object-Oriented Analysis and Design. Addison Wesley, 1994.
[4] M. Castellanos and F. Saltor: Semantic Enrichment of Database Schemas:
An Object Oriented Approach. Proc. of the 1st Workshop on Interoperability
in Multidatabase Systems, April 1991, Kyoto.
[6] S. Ceri: Methodology and tools for database design. North-Holland,
Amsterdam, 1983.
[7] U.S. Chakravarthy: Semantic Query Optimisation in Deductive Databases.
Ph.D. Thesis, Dept. of Computer Science, University of Maryland, 1986.
[8] L.A. Kalinichenko: Methods and Tools for Equivalent Data Model Mapping
Construction. Proc. of EDBT, March 1990, Venise.
[9] First International Workshop on Interoperability in Multidatabase Systems.
April 1991, Kyoto.
[10] M.R. Genesereth, N.P. Singh and M.A. Sayed: A Distributed and Anonymous
Knowledge Sharing Approach to Software Interoperation. Proc. of the
FGCS'94 Workshop on Heterogeneous Knowledge-Bases, Tokyo, Dec. 1994.
[11] M. Jarke: external Semantic Query Simplification, A Graph Theoretic
Approach and its Implementation in Prolog. In Expert Database Systems,
Kerschberg (ed.), Benjamin/Cummings Publishing Co, 1985.
[12] Y. Lien: Hierarchical schemata for relational databases. ACM Trans.
Database Syst. 6, pp. 48-69.
[13] L.L. Yan and T.W. Ling: Translating Schema With Constraints into OODB
Schema. North Holland, "In Semantic of Interoperable Systems", D. K.
Hsiao, E. J. Neuhold and R. S. Davis (eds.), 1993.
[14] M. Markowitz and V.M. Shoshani: On the correctness of representing
extended Entity-Relationship structures in the relational model.
Proc. of Int. Conf. on the Management of Data, Portland, 1989.
[15] M. Papazoglou, Z. Tari and N. Russel: Object Oriented Technology for
Inter-Schema and Language Mappings. To appear as chapter book, O.
Bukhres and A. K. Elmagardmid (eds), "Object Oriented Multidatabase
Systems: A Solution for Advanced Applications", Prentice Hall, 1994.
[16] S. Shenoy and Z. Ozsoyoglu: A System for Semantic Query Optimisation.
Proc. of the ACM- SIGMOD Conference on Management of Data, 1987.
[17] S. Shenoy and Z. Ozsoyoglu: Design and Implementation of a Semantic
Query Optimiser. IEEE Transactions on Knowledge and Data Engineering,
1(3), 1989.
[18] D. Tschritzis and F. Lochovsky: Data models. Chap. 14. Prentice-Hall,
Englewood Cliffs, N. J.
[19] A. Sheth, J. Larson: Federated Database Systems for Managing
Distributed, Heterogeneous, and Autonomous Databases. ACM Computing
Surveys, 2(3), Sept. 1990, pp. 183-236.
[20] Z. Tari: Interoperability Between Data Models. North Holland, "In
Semantic of Interoperable Systems", D. K. Hsiao, E. J. Neuhold and
R. S. Davis (Eds.), 1993.
[21] Z. Tari: ERC++: a data Model that Combines Objects and Rules. Proc. of
Int. Conf. on Information and Knowledge Management, Washington, 1993.
[22] Z. Tari: On the Design of Object-Oriented Databases. In Proc. of Entity
Relationship Approach, G. Pernul and A.M. Tjoa (eds.), Springer-Verlag,
1992.
[23] Z. Tari and M. Orlowski: A Distributed Object Kernel for Interoperable
Databases.
Technical report, Queensland University of Technology, Brisbane, 1995.
[24] P. Coad and E. Yourdon: Object-Oriented Analysis, Object-Oriented Design.
Prentice Hall, 1990.
[25] C. Zaniolo: Multimodel external schemas for CODASYL data base management
systems. In "Data Base Architecture", G. Bracchi and G. Nijssen (eds.),
North-Holland, The Netherlands.