Outline of Algorithm

Next: Aligning Children Up: Generating the Edit Previous: Generating the Edit

Outline of Algorithm

The algorithm is most easily explained as consisting of the five phases that we describe below. We use our running example from Figure 1. We are required to find a minimum cost edit script that transforms into , given the matching M shown by the dashed lines in the figure.

The Update Phase: In the update phase, we look for pairs of nodes such that the values at nodes x and y differ. For each such pair (in any order) we add the edit operation to E (recall that for a node x, denotes the value of x), and we apply the update operation to . At the end of the update phase, we have transformed such that for every pair of nodes .

Figure 4: Running example: after align phase

Figure 5: Running example: after insert phase

The Align Phase: Let the partner of a node denote the node to which it is matched (by a given matching). Suppose . We say that the children of x and y are misaligned if x has matched children u and v such that u is to the left of v in but the partner of u is to the right of the partner of v in . In Figure 1, the children of the root nodes 1 and 11 are misaligned. In the align phase we check each pair of matched internal nodes (in any order) to see if their children are misaligned. If we find that the children are misaligned, we append move operations to E to align the children. We explain how the move operations are determined in Section 4.2 below. In our running example, we append to E, and we apply the move operation to . The new is shown in Figure 4.

The Insert Phase: We assume, without loss of generality, that the roots of and are matched in M. In the insert phase, we look for an unmatched node such that its parent is matched. Suppose (i.e., y is the parent of z) and y's partner in is x. We create a new identifier w and append to E. The position k is determined with respect to the children of x and z that have already been aligned with respect to each other; details are in Section 4.3. We also apply the insert operation to and add to M. In our running example we append . The transformed and the augmented M are shown in Figure 5. At the end of the insert phase, every node in is matched but there may still be nodes in that are unmatched.

Figure 6: Running example: after delete phase

The Move Phase: In the move phase we look for pairs of nodes such that . (Recall from Section 3.1 that denotes the parent of x.) Suppose . We know that at the end of the insert phase, v has some partner u in . We append the operation to E, and we apply the move operation to . Here the position k is determined with respect to the children of u and v that have already been aligned, as in the insert phase. At the end of the move phase is isomorphic to except for unmatched nodes in . In our running example, we do not need to perform any actions in this phase.

The Delete Phase: In the delete phase we look for unmatched leaf nodes . For each such node we append to E and apply the delete operation to . (Note that this process will result in a bottom-up delete---descendents will be deleted before their ancestors.) At the end of the delete phase is isomorphic to , E is the final edit script, and M is the total matching to which E conforms. Figure 6 shows the trees and the matching after the delete phase.

Next: Aligning Children Up: Generating the Edit Previous: Generating the Edit

Sudarshan S. Chawathe
Wed Jun 19 08:22:34 PDT 1996