The C3 Project at Stanford

Changes, Consistency, and Configurations in Heterogeneous Distributed Information Systems

Introduction

C3 is a DARPA-funded project within the Stanford Database Group. Our goal is to provide a comprehensive framework for managing change in an environment of heterogeneous information systems. Some of the issues that we are working on are the following:

Detecting Changes
Representing and Querying Changes
Query Subscriptions and Change Notification

Detecting Changes

How do we find out about changes to data? In an ideal world, all relevant data would reside in a database in which we could set triggers to inform us of changes as they occur. However, this ideal scenario is difficult, if not impossible, to realize in practice due to technical as well as non-technical (organizational autonomy) reasons. (For example, consider trying to find out about changes to the web pages you are interested in by using this method!) We are therefore often forced to detect changes using only snapshots of the data. In cases where the data is essentially relational, and contains keys, we can use the keys to efficiently compute the changes, as described in Efficient Algorithms to Compare Snapshots. However, in many cases, the data is not relational, or does not contain usable keys. What does detecting changes mean in such a scenario? Can we design efficient algorithms to detect changes from snapshots? To find out the answers to these questions, please read our paper, Change Detection in Hierarchically Structured Information, which appeared in SIGMOD '96, and which is available here in PostScript and HTML formats.

Using the algorithm described in the above paper, we implemented a simple HTML Diff program that compares two HTML pages and produces an annotated document describing the changes. (This program was demonstrated at the DARPA I3/POB workshop in San Diego in January 1996.) You may browse some sample output of HTML Diff to get a feel of what it does. We are currently improving both our change detection algorithm and our diff implementation in significant ways.

Our first change detection algorithm makes some simplifying assumptions based on domain characteristics to find changes efficiently. For cases in which these assumptions do not hold, we have designed a more general algorithm, called mh-diff . This algorithm also uses a richer set of change operations, e.g., subtree copy . For details, please read our paper Meaningful Change Detection in Structured Data , which appeared in SIGMOD '97, and which is available here in PostScript and HTML format.

Representing and Querying Changes

Once we have found the changes to the data of interest, what can we do with them? For small amounts of data, we may want to simply browse the changes. How do we represent changes for this purpose? As the amount of data gets larger, browsing changes quickly becomes impracticable. Wouldn't it be nice if we could query over changes just like we can query over data?

We have developed a uniform representation scheme for changes, and a simple query language for extracting the changes of interest from this representation. Our change-representation model builds on the Object Exchange Model (OEM) designed as part of the Tsimmis Project, and is called Delta-OEM (DOEM, pronounced "doom"). Our query language for changes is based on the Lorel query language that is part of the Lore effort, and is called Chorel. Please refer to our paper Representing and Querying Changes in Semistructured Data , which is to appear in ICDE '98, and which is available here in PostScript format. We have implemented DOEM and Chorel as part of a Change Object Repository (CORE) module. CORE is implemented as an extension to the Lore system.

Query Subscriptions and Change Notification

Consider some data that one is very interested in, and that one wishes to monitor closely. (For example, one's favorite web page.) Assuming we have a way to detect, represent, and query changes to this web page, we can always poll for changes by repeatedly issuing delta queries. However, it would be much nicer (and potentially more efficient) to simply subscribe to a change notification service that will notify us whenever there are changes of interest.

We are implementing a Query Subscription Service (QSS) that provides such a notification mechanism for changes in Tsimmis data sources. QSS supports many kinds of subscriptions, allowing changes to be detected, queried, and reported in a number of ways. For example, one kind of subscription notifies the user whenever a change of interest to the user occurs. Another kind allows the user to retrieve the "new" changes of interest (the changes since the last time the user checked with QSS) whenever the user wishes. This work uses the Tsimmis client support library for querying information sources, our implementation of an OEM diff program for detecting changes, and CORE for the storing and querying changes.

For a description of our implementation of a change management system based on the above ideas, please refer to Change Management in Heterogeneous Semistructured Databases (Demonstration Description) , available in PostScript format.

Project Members

Faculty
- Jennifer Widom
Research Associates and Visitors
- Serge Abiteboul
Graduate students

Publications

Change Detection in Hierarchically Structured Information. Available in PostScript and HTML formats.
Meaningful Change Detection in Structured Data . Available in PostScript and HTML formats.
Representing and Querying Changes in Semistructured Data . Available in PostScript format.
Change Management in Heterogeneous Semistructured Databases (Demonstration Description) . Available in PostScript format.

[ DB Projects | DB Group | Stanford Computer Science | Stanford University ]

Implementors Page (password protected)

Sudarshan S. Chawathe (chaw@cs.stanford.edu)
Last updated: Thu Nov 6 11:42:36 PST 1997

Why validate your HTML?