Next: An Edge-wise Cost Function Up: Meaningful Change Detection in Previous: Using Edge Covers to

Finding the Edge Cover

In this section we describe how MH-DIFF finds a minimal edge cover of the induced graph. The resulting cover will serve as input to algorithm CtoS (Section 4). Our goal is to find not just any minimal edge cover, but one that corresponds to a minimum-cost edit script. Let us call such an minimal edge cover the target cover.

Consider an edge e in our pruned induced graph. To get to the target cover, MH-DIFF must decide whether e should be included in the cover. To reach this decision, it would be nice if MH-DIFF knew the ``cost'' of e. That is, if e remains in the target cover, then it would be annotated (by algorithm CtoS) with some operation, and we could say that the cost of this operation is the cost of e. Unfortunately, we have a ``chicken and the egg problem'' here: CtoS cannot run until we have the target cover, and we cannot get the target cover until we know the costs it will imply. To break the impasse, our approach uses the following idea:

Instead of trying to compute the actual cost of e, we compute an upper and lower bound to this cost. These bounds can be computed without the knowledge of which other edges are included in the target cover, and serve two purposes: Firstly, they allow us to design pruning rules that are used to conservatively eliminate unnecessary edges from the induced graph. Secondly, after pruning, the bounds can guide our search for the target cover.

As an enhancement, we actually use a variation on the edge cost suggested above. The following example shows that simply ``charging'' each annotation to the edge it is on is not entirely ``fair.'' We are given a tree containing two nodes, and with the same label l. Furthermore has children and with labels a and b, respectively, and has children and with labels c and d, respectively. Suppose is a logical copy of . (That is, and are isomorphic.) Consider an edge cover that matches each node in to its copy in except that it ``cross matches'' and across the trees, as shown in Figure 8. Given this edge cover, algorithm CtoS will produce a move operation for each of the nodes , , , and . However, these move operations were caused not by any mismatching of the nodes , , , or , but instead, by the mismatching of and . Therefore it would be intuitively more fair to charge these move operations to the edges responsible for the mismatch, viz. and . To achieve this, we use the following scheme: If e is annotated with INS, DEL, or UPD in the target cover, we do charge e for this operation. However, if e is annotated by MOV, CPY, or GLU, then the parent of e, and not e is charged. We call the edge costs computed in such a fashion fair costs, and define them below:

Figure 8: Distributing edge costs fairly

Next: An Edge-wise Cost Function Up: Meaningful Change Detection in Previous: Using Edge Covers to

Sudarshan S. Chawathe
Sat Feb 22 12:28:02 PST 1997