ORGANIZATION: Stanford University
PRINCIPAL INVESTIGATOR(S): Hector Garcia-Molina, email@example.com, (415) 723-0685
TEAM MEMBERS / GRADUATE STUDENTS:
TITLE OF EFFORT: Tsimmis
SUBTITLE: An Integrated Information Management System
EXECUTIVE SUMMARY PARAGRAPH:
Tsimmis is developing a next generation information management system for integrated access to a wide variety of information and knowledge sources, including ones with unstructured data (e.g., newswire stories, files) and structured data (e.g., relational and object oriented databases). The project has developed a simple, yet flexible model (OEM) for representing unstructured, dynamic information. Wrappers and mediators that perform semantic integration and transformation of information have also been implemented. The current focus is on techniques for rapid development of wrappers and mediators, based on high level descriptions of the required operations.
The main goal is to allow a decision maker to find heterogeneous information of interest, fuse information from different sources, and process it (e.g., summarize it, visualize it, discover trends). Another important goal is the easy incorporation of new sources, as well the ability of deal with sources whose structure or services evolve.
Approach is to build and demonstrate four components: wrappers, mediators, query normalizers, and constraint managers. Wrappers convert commands and information to and from sources into a common object model and command language. Mediators can combine information from several sources, performing fusion and integration. Query normalizers augment the set of queries that can be handled by a mediator or wrapper by logical deduction of the answer to these queries from one or more of the queries the mediator or wrapper has been designed to answer. Constraint managers enforce constraints or detect their violations in a distributed fashion, using whatever capabilities are available at the underlying sources. The technology for rapid implementation of wrappers and mediators, based on high level descriptions of their functionality, is also being developed. Finally, access to the heterogeneous sources is being provided through existing World Wide Web browsers, so that users can explore the information.
A simple yet powerful Object Exchange Model has been developed, together with an SQL-like query language. Based on these, several wrappers and mediators have been implemented and demonstrated on a collection of heterogeneous bibliographic information sources. A second version of MOBIE, a configurable web browsing tool for heterogeneous information, has been developed. Basic technology and an initial prototype for constraint management have been developed. Design of the wrapper generation toolkit has been completed, and a preliminary version is running. Preliminary definition of the mediator specification language has been completed, and an initial mediator generation toolkit (MedMaker) is under implementation. The theory of query normalization has been developed, based in part on algorithms for testing containment of conjunctive queries. For efficiency, wrappers and mediators must cache OEM objects; thus an OEM database system (LORE) has been implemented. (LORE can also be used in a stand-alone fashion.) LORE uses an extended SQL-like query language called LOREL.
DATE PREPARED: June 30, 1995
1. ARPA ORDER NUMBER: A004
2. BAA NUMBER: BAA 92-06
3. CONTRACT/GRANT NUMBER: F33615-93-1-1339-P00001
4. AGENT: Department of the Air Force, Wright Laboratory
5. CONTRACT TITLE: TSIMMIS: An Integrated Information Management System
6. CONTRACTOR/ORGANIZATION: Stanford University
7. SUBCONTRACTORS: None
8. CO-PRINCIPAL INVESTIGATORS: Hector Garcia-Molina
9. ACTUAL START DATE: 06/01/93
10. EXPECTED END DATE: 09/30/96
11. FUNDING PROFILE:
11.1. Current contract: $$$
11.2. Options (Not exercised): None
11.3. Total funds provided to date for all years: $$$
Total funds expended to date: $$$
As of date: 05/31/95
11.4. Date total current funding will be expended: 10/31/95
11.5. Funds required in FY96: $$$
12. ANYTHING ELSE YOU NEED (from ARPA): Nothing
BROWSER FOR HETEROGENEOUS INFORMATION DEVELOPED
End users often have to explore heterogeneous information sources that contain unstructured information. We have developed a graphical interface based on the HTTP protocol called MOBIE. MOBIE is a platform-independent tool for displaying and exploring OEM objects that are returned as a result of queries. (OEM is a self-describing, flexible model for representing unstructured information.) MOBIE provides a mechanism for navigating through objects, zooming in on their nested substructures as necessary. It is not required to know in advance the structure or schema of the data being explored. MOBIE uses HTML commands to format the portion of the object space being displayed, using indentation and links to represent the data structure and relationships. The end user actually uses MOBIE through a standard World Wide Web browser such as Mosaic or Netscape. An important advantage of WWW browsers as the basis for MOBIE is their widespread use and capacity for universal access.
WRAPPER GENERATION TOOLKIT DEVELOPED
The TSIMMIS Project has developed a wrapper implementation toolkit for rapidly building wrappers. The toolkit contains a library of commonly used functions, such as for receiving queries from the application and packaging results. It also contains a facility for translating queries into source-specific commands and queries, and for translating results into a model useful to the application. The central component of the toolkit is the query translation component, called the converter. The implementor gives the converter a set of templates that describe the queries accepted by the wrapper. If an application query matches a template, an implementor-provided action associated with the template is executed to produce the native query for the underlying source. The converter can also handle some queries that no not exactly match templates, by producing a filter for post-processing the source results. The toolkit allows rapid implementation of wrappers for new sources, with minimal programming effort.
OBJECT REPOSITORY AND QUERY LANGUAGE FOR SEMISTRUCTURED INFORMATION
TSIMMIS uses a simple yet powerful data model called OEM (for Object Exchange Model). OEM is well-suited for representing heterogeneous and semistructured information. We have designed and implemented a repository for OEM objects called LORE (for Lightweight Object Repository). LORE includes standard database system features such as a storage manager, buffer manager, bulk loader, and query optimizer and executor. LORE does not include "heavyweight" features such as concurrency control and recovery, since these features are not needed for LORE's current uses. We have extended the kernel query language of TSIMMIS (called OEM-QL) to a more powerful language (called LOREL, for LORE Language) that supports queries with advanced features such as subqueries, aggregates, and result restructuring. LORE and LOREL are useful within the TSIMMIS architecture in several places: to store very large query results for browsing, to save query results for future use, to store intermediate results during wrapper and mediator execution, and to quickly integrate new information sources.
An effort is starting in the summer of 1995 to transition TSIMMIS technology to the Air Force Alternative-Fueled Vehicles (AFV) program. TASC (a private contractor), working with the Alternative-Fueled Vehicles Sponsored Project Office (AFVSPO) located at Warner Robins AFB, will manage the effort. Stanford will provide TSIMMIS technology and work with the TASC team. The goal is to provide access to sources containing AFV data such as vendor advertisements and test data from vehicles already in use. Several wrappers and mediators will be implemented for this application, and access will be provided through MOBIE. The point of contact for the TSIMMIS part of this effort, as well for information on use of TSIMMIS technology, is Dr. Joachim Hammer at (415) 723-3118 or firstname.lastname@example.org.
See TSIMMIS home page.
Schedule and Milestones
Additional information on the TSIMMIS Project can be found at URL http://db.stanford.edu.
DATE PREPARED: August 21, 1995
Return to Progress Reports Home Page