ORGANIZATION: Stanford University
SUBCONTRACTORS: None
PRINCIPAL INVESTIGATOR(S): Hector Garcia-Molina, hector@cs.stanford.edu, (415) 723-0685
TEAM MEMBERS / GRADUATE STUDENTS:
TITLE OF EFFORT: Tsimmis
SUBTITLE: An Integrated Information Management System
EXECUTIVE SUMMARY PARAGRAPH:
Tsimmis is developing a next generation information management system
for integrated access to a wide variety of information and knowledge
sources, including ones with unstructured data (e.g., newswire
stories, files) and structured data (e.g., relational and object
oriented databases). The project has developed a simple, yet flexible
model (OEM) for representing unstructured, dynamic information.
Wrappers and mediators that perform semantic integration and
transformation of information have also been implemented. The current
focus is on techniques for rapid development of wrappers and
mediators, based on high level descriptions of the required operations.
OBJECTIVE:
The main goal is to allow a decision maker to find heterogeneous
information of interest, fuse information from different sources, and
process it (e.g., summarize it, visualize it, discover trends).
Another important goal is the easy incorporation of new sources, as
well the ability of deal with sources whose structure or services evolve.
APPROACH:
Approach is to build and demonstrate four components: wrappers,
mediators, query normalizers, and constraint managers. Wrappers
convert commands and information to and from sources into a common
object model and command language. Mediators can combine information
from several sources, performing fusion and integration. Query
normalizers augment the set of queries that can be handled by a
mediator or wrapper by logical deduction of the answer to these
queries from one or more of the queries the mediator or wrapper has
been designed to answer. Constraint managers enforce constraints or
detect their violations in a distributed fashion, using whatever
capabilities are available at the underlying sources. The technology
for rapid implementation of wrappers and mediators, based on high
level descriptions of their functionality, is also being developed.
Finally, access to the heterogeneous sources is being provided through
existing World Wide Web browsers, so that users can explore the information.
PROGRESS:
A simple yet powerful Object Exchange Model has been developed, together
with an SQL-like query language. Based on these, several wrappers and
mediators have been implemented and demonstrated on a collection of
heterogeneous bibliographic information sources. A second version of MOBIE,
a configurable web browsing tool for heterogeneous information, has been
developed. Basic technology and an initial prototype for constraint
management have been developed. Design of the wrapper generation toolkit has
been completed, and a preliminary version is running. Preliminary definition
of the mediator specification language has been completed, and an initial
mediator generation toolkit (MedMaker) is under implementation. The
theory of query normalization has been developed, based in part on
algorithms for testing containment of conjunctive queries. For
efficiency, wrappers and mediators must cache OEM objects; thus an OEM
database system (LORE) has been implemented. (LORE can also be used
in a stand-alone fashion.) LORE uses an extended SQL-like query
language called LOREL.
PRODUCTS: None.
FY95 ACCOMPLISHMENTS:
SELECTED PUBLICATIONS:
DATE PREPARED: June 30, 1995
1. ARPA ORDER NUMBER: A004
2. BAA NUMBER: BAA 92-06
3. CONTRACT/GRANT NUMBER: F33615-93-1-1339-P00001
4. AGENT: Department of the Air Force, Wright Laboratory
5. CONTRACT TITLE: TSIMMIS: An Integrated Information Management System
6. CONTRACTOR/ORGANIZATION: Stanford University
7. SUBCONTRACTORS: None
8. CO-PRINCIPAL INVESTIGATORS: Hector Garcia-Molina
9. ACTUAL START DATE: 06/01/93
10. EXPECTED END DATE: 09/30/96
11. FUNDING PROFILE:
11.1. Current contract: $$$
11.2. Options (Not exercised): None
11.3. Total funds provided to date for all years: $$$
Total funds expended to date: $$$
As of date: 05/31/95
11.4. Date total current funding will be expended: 10/31/95
11.5. Funds required in FY96: $$$
12. ANYTHING ELSE YOU NEED (from ARPA): Nothing
BROWSER FOR HETEROGENEOUS INFORMATION DEVELOPED
End users often have to explore heterogeneous information sources
that contain unstructured information. We have developed a graphical
interface based on the HTTP protocol called MOBIE. MOBIE is a
platform-independent tool for displaying and exploring OEM objects
that are returned as a result of queries. (OEM is a self-describing,
flexible model for representing unstructured information.) MOBIE
provides a mechanism for navigating through objects, zooming in on
their nested substructures as necessary. It is not required to know
in advance the structure or schema of the data being explored. MOBIE
uses HTML commands to format the portion of the object space being
displayed, using indentation and links to represent the data
structure and relationships. The end user actually uses MOBIE
through a standard World Wide Web browser such as Mosaic or Netscape.
An important advantage of WWW browsers as the basis for MOBIE is
their widespread use and capacity for universal access.
WRAPPER GENERATION TOOLKIT DEVELOPED
The TSIMMIS Project has developed a wrapper implementation toolkit for
rapidly building wrappers. The toolkit contains a library of commonly
used functions, such as for receiving queries from the application and
packaging results. It also contains a facility for translating
queries into source-specific commands and queries, and for translating
results into a model useful to the application. The central component
of the toolkit is the query translation component, called the converter.
The implementor gives the converter a set of templates that describe
the queries accepted by the wrapper. If an application query matches
a template, an implementor-provided action associated with the
template is executed to produce the native query for the underlying
source. The converter can also handle some queries that no not
exactly match templates, by producing a filter for post-processing
the source results. The toolkit allows rapid implementation of wrappers
for new sources, with minimal programming effort.
OBJECT REPOSITORY AND QUERY LANGUAGE FOR SEMISTRUCTURED INFORMATION
TSIMMIS uses a simple yet powerful data model called OEM (for Object
Exchange Model). OEM is well-suited for representing heterogeneous
and semistructured information. We have designed and implemented a
repository for OEM objects called LORE (for Lightweight Object
Repository). LORE includes standard database system features such as
a storage manager, buffer manager, bulk loader, and query optimizer
and executor. LORE does not include "heavyweight" features such as
concurrency control and recovery, since these features are not needed
for LORE's current uses. We have extended the kernel query language
of TSIMMIS (called OEM-QL) to a more powerful language (called LOREL,
for LORE Language) that supports queries with advanced features such
as subqueries, aggregates, and result restructuring. LORE and LOREL
are useful within the TSIMMIS architecture in several places: to store
very large query results for browsing, to save query results for
future use, to store intermediate results during wrapper and mediator
execution, and to quickly integrate new information sources.
TECHNOLOGY TRANSITION:
An effort is starting in the summer of 1995 to transition TSIMMIS
technology to the Air Force Alternative-Fueled Vehicles (AFV)
program. TASC (a private contractor), working with the
Alternative-Fueled Vehicles Sponsored Project Office (AFVSPO) located
at Warner Robins AFB, will manage the effort. Stanford will provide
TSIMMIS technology and work with the TASC team. The goal is to
provide access to sources containing AFV data such as vendor
advertisements and test data from vehicles already in use. Several
wrappers and mediators will be implemented for this application, and
access will be provided through MOBIE. The point of contact for the
TSIMMIS part of this effort, as well for information on use of
TSIMMIS technology, is Dr. Joachim Hammer at (415) 723-3118 or
joachim@db.stanford.edu.
Picture
See TSIMMIS home page.
Goals/Payoff
Technical Challenges
Schedule and Milestones
Background
Technical Approach
Additional information on the TSIMMIS Project can be found at URL http://db.stanford.edu.
DATE PREPARED: August 21, 1995