=== TSIMMIS PROJECT SUMMARY 1995 ===



ORGANIZATION:
Stanford University

SUBCONTRACTORS: None

PRINCIPAL INVESTIGATOR(S): Hector Garcia-Molina, hector@cs.stanford.edu, (415) 723-0685

TEAM MEMBERS / GRADUATE STUDENTS:

TITLE OF EFFORT: Tsimmis

SUBTITLE: An Integrated Information Management System

EXECUTIVE SUMMARY PARAGRAPH:
Tsimmis is developing a next generation information management system for integrated access to a wide variety of information and knowledge sources, including ones with unstructured data (e.g., newswire stories, files) and structured data (e.g., relational and object oriented databases). The project has developed a simple, yet flexible model (OEM) for representing unstructured, dynamic information. Wrappers and mediators that perform semantic integration and transformation of information have also been implemented. The current focus is on techniques for rapid development of wrappers and mediators, based on high level descriptions of the required operations.

OBJECTIVE:
The main goal is to allow a decision maker to find heterogeneous information of interest, fuse information from different sources, and process it (e.g., summarize it, visualize it, discover trends). Another important goal is the easy incorporation of new sources, as well the ability of deal with sources whose structure or services evolve.

APPROACH:
Approach is to build and demonstrate four components: wrappers, mediators, query normalizers, and constraint managers. Wrappers convert commands and information to and from sources into a common object model and command language. Mediators can combine information from several sources, performing fusion and integration. Query normalizers augment the set of queries that can be handled by a mediator or wrapper by logical deduction of the answer to these queries from one or more of the queries the mediator or wrapper has been designed to answer. Constraint managers enforce constraints or detect their violations in a distributed fashion, using whatever capabilities are available at the underlying sources. The technology for rapid implementation of wrappers and mediators, based on high level descriptions of their functionality, is also being developed. Finally, access to the heterogeneous sources is being provided through existing World Wide Web browsers, so that users can explore the information.

PROGRESS:
A simple yet powerful Object Exchange Model has been developed, together with an SQL-like query language. Based on these, several wrappers and mediators have been implemented and demonstrated on a collection of heterogeneous bibliographic information sources. A second version of MOBIE, a configurable web browsing tool for heterogeneous information, has been developed. Basic technology and an initial prototype for constraint management have been developed. Design of the wrapper generation toolkit has been completed, and a preliminary version is running. Preliminary definition of the mediator specification language has been completed, and an initial mediator generation toolkit (MedMaker) is under implementation. The theory of query normalization has been developed, based in part on algorithms for testing containment of conjunctive queries. For efficiency, wrappers and mediators must cache OEM objects; thus an OEM database system (LORE) has been implemented. (LORE can also be used in a stand-alone fashion.) LORE uses an extended SQL-like query language called LOREL.

PRODUCTS: None.

FY95 ACCOMPLISHMENTS:

  1. Implemented second version of MOBIE, with an improved interface, facilities for frequent queries, and configurable by the label of the displayed objects.
  2. Completed design of wrapper generator tool kit. Based on query "templates" defined by the user, it recognizes large classes of queries and converts them to native queries of the underlying source. The toolkit also provides facilities for parsing answers from a source and for extracting fields that make up result objects.
  3. Completed preliminary definition of mediator specification language. With this language, one can define many of the semantic integration and transformation tasks that a mediator must perform.
  4. Designed and implemented LORE, an OEM database system for caching objects in wrappers and mediators for increased efficiency. LORE's query language, LOREL, was developed, providing an intuitive, yet powerful interface to an OEM repository.

SELECTED PUBLICATIONS:

DATE PREPARED: June 30, 1995

=== ADMINISTRATIVE DATA ===

1. ARPA ORDER NUMBER: A004

2. BAA NUMBER: BAA 92-06

3. CONTRACT/GRANT NUMBER: F33615-93-1-1339-P00001

4. AGENT: Department of the Air Force, Wright Laboratory

5. CONTRACT TITLE: TSIMMIS: An Integrated Information Management System

6. CONTRACTOR/ORGANIZATION: Stanford University

7. SUBCONTRACTORS: None

8. CO-PRINCIPAL INVESTIGATORS: Hector Garcia-Molina

9. ACTUAL START DATE: 06/01/93

10. EXPECTED END DATE: 09/30/96

11. FUNDING PROFILE:
11.1. Current contract: $$$
11.2. Options (Not exercised): None
11.3. Total funds provided to date for all years: $$$
Total funds expended to date: $$$
As of date: 05/31/95
11.4. Date total current funding will be expended: 10/31/95
11.5. Funds required in FY96: $$$

12. ANYTHING ELSE YOU NEED (from ARPA): Nothing

===SIGNIFICANT EVENTS===

BROWSER FOR HETEROGENEOUS INFORMATION DEVELOPED
End users often have to explore heterogeneous information sources that contain unstructured information. We have developed a graphical interface based on the HTTP protocol called MOBIE. MOBIE is a platform-independent tool for displaying and exploring OEM objects that are returned as a result of queries. (OEM is a self-describing, flexible model for representing unstructured information.) MOBIE provides a mechanism for navigating through objects, zooming in on their nested substructures as necessary. It is not required to know in advance the structure or schema of the data being explored. MOBIE uses HTML commands to format the portion of the object space being displayed, using indentation and links to represent the data structure and relationships. The end user actually uses MOBIE through a standard World Wide Web browser such as Mosaic or Netscape. An important advantage of WWW browsers as the basis for MOBIE is their widespread use and capacity for universal access.

WRAPPER GENERATION TOOLKIT DEVELOPED
The TSIMMIS Project has developed a wrapper implementation toolkit for rapidly building wrappers. The toolkit contains a library of commonly used functions, such as for receiving queries from the application and packaging results. It also contains a facility for translating queries into source-specific commands and queries, and for translating results into a model useful to the application. The central component of the toolkit is the query translation component, called the converter. The implementor gives the converter a set of templates that describe the queries accepted by the wrapper. If an application query matches a template, an implementor-provided action associated with the template is executed to produce the native query for the underlying source. The converter can also handle some queries that no not exactly match templates, by producing a filter for post-processing the source results. The toolkit allows rapid implementation of wrappers for new sources, with minimal programming effort.

OBJECT REPOSITORY AND QUERY LANGUAGE FOR SEMISTRUCTURED INFORMATION
TSIMMIS uses a simple yet powerful data model called OEM (for Object Exchange Model). OEM is well-suited for representing heterogeneous and semistructured information. We have designed and implemented a repository for OEM objects called LORE (for Lightweight Object Repository). LORE includes standard database system features such as a storage manager, buffer manager, bulk loader, and query optimizer and executor. LORE does not include "heavyweight" features such as concurrency control and recovery, since these features are not needed for LORE's current uses. We have extended the kernel query language of TSIMMIS (called OEM-QL) to a more powerful language (called LOREL, for LORE Language) that supports queries with advanced features such as subqueries, aggregates, and result restructuring. LORE and LOREL are useful within the TSIMMIS architecture in several places: to store very large query results for browsing, to save query results for future use, to store intermediate results during wrapper and mediator execution, and to quickly integrate new information sources.

===FY 96 EVENTS===

  1. MEDIATOR GENERATION TOOLKIT The Mediator Generation Toolkit (Medmaker) will allow users to describe in a non-procedural way the integration and transformation tasks that need to be performed by a mediator. For this definition, plus a few functions defined as procedures, MedMaker will generate a mediator.
  2. QUERY NORMALIZER FOR WRAPPERS A query normalizer extends the functionality of a wrapper. The normalizer takes as input a query destined for a source. It compares the query against templates that describe the functionality of the source, and determines if the target query can be answered through a "broader" query that returns more information that what is desired. If so, the broader query is submitted to the source, and the returned data is filtered to reduce the answer to the correct one.

===TECHNOLOGY TRANSITION===

TECHNOLOGY TRANSITION:
An effort is starting in the summer of 1995 to transition TSIMMIS technology to the Air Force Alternative-Fueled Vehicles (AFV) program. TASC (a private contractor), working with the Alternative-Fueled Vehicles Sponsored Project Office (AFVSPO) located at Warner Robins AFB, will manage the effort. Stanford will provide TSIMMIS technology and work with the TASC team. The goal is to provide access to sources containing AFV data such as vendor advertisements and test data from vehicles already in use. Several wrappers and mediators will be implemented for this application, and access will be provided through MOBIE. The point of contact for the TSIMMIS part of this effort, as well for information on use of TSIMMIS technology, is Dr. Joachim Hammer at (415) 723-3118 or joachim@db.stanford.edu.

===QUAD CHARTS===

Picture
See TSIMMIS home page.

Goals/Payoff

Technical Challenges

Schedule and Milestones

Background

Technical Approach

===ADDRESSES===

Additional information on the TSIMMIS Project can be found at URL http://db.stanford.edu.



DATE PREPARED: August 21, 1995


Return to Progress Reports Home Page