We now present the complete algorithm to compute a minimum cost edit
script E conforming to a given matching M between trees and
. In the algorithm, we combine the first four phases of
Section 4.1 (the update, insert, align, and move phases)
into one breadth-first scan on
. The delete phase requires a
post-order traversal of
(which visits each node after visiting
all its children). The order in which the nodes are visited and the
edit operations are generated is crucial to the correctness of the
algorithm. (For example, an insert may need to precede a move, if the
moved node becomes the child of the inserted node.) The algorithm
applies the edit operations to
as they are appended to the edit
script E. When the algorithm terminates,
is isomorphic to
. The algorithm also uses a matching
that is initially M,
and adds matches to it so that
is a total matching when the
algorithm terminates. As mentioned earlier, we assume without loss of
generality that the roots of
and
are matched in M.
Figure 8: Algorithm EditScript
Figure 9: Functions AlignChildren and FindPos
The algorithm is shown in Figure 8. It uses two procedures, AlignChildren and FindPos, shown in Figure 9. The claims made by the two statements in Algorithm EditScript that are marked with (*) are substantiated in [CRGMW95], where it is also proved that Algorithm EditScript generates a minimum cost edit script conforming to the given matching M.
Let us now consider the running time of this algorithm. We first
define the notion of misaligned nodes. Suppose and
. A move of the form
for some k is called an
intra-parent move of node x; such moves are generated in the
align phase of the algorithm. The number of misaligned nodes of
with respect to
is the minimum number of intra-parent moves
among all minimum cost edit scripts. We can show [CRGMW95]
that the running time of Algorithm EditScript is
,
where N is the total number of nodes in
and
and D is
the total number of misaligned nodes. (Note that D is typically
much smaller than N.)