Sudarshan S. Chawathe Hector Garcia-Molina
Computer Science Department, Stanford University,
Stanford, California 94305
{chaw,hector}@cs.stanford.edu
Detecting changes by comparing data snapshots is an important
requirement for difference queries, active databases, and version
and configuration management. In this paper we focus on detecting
meaningful changes in hierarchically structured data, such as
nested-object data. This problem is much more challenging than the
corresponding one for relational or flat-file data. In order to
describe changes better, we base our work not just on the
traditional ``atomic'' insert, delete, update operations, but also
on operations that move an entire sub-tree of nodes, and that copy
an entire sub-tree. These operations allows us to describe changes
in a semantically more meaningful way. Since this change detection
problem is -hard, in this paper we present a heuristic
change detection algorithm that yields close to ``minimal''
descriptions of the changes, and that has fewer restrictions than
previous algorithms. Our algorithm is based on transforming the
change detection problem to a problem of computing a minimum-cost
edge cover of a bipartite graph. We study the quality of the
solution produced by our algorithm, as well as the running time,
both analytically and experimentally.