Consider the complete bipartite graph B consisting of the nodes of
on one side, and the nodes of
on the other, plus the
special nodes
(on
's side) and
(on
's
side). We call B the induced graph of
and
. The
dashed lines in Figure 2 correspond to a few edges
of the induced graph. Intuitively, we would like to find a subset K
of the edges of B that tells us the correspondence between the nodes
of
and
. If an edge connects a node
to a node
, it means that n was ``derived'' from m. (For example,
n may be a copy of m.) We say m is matched to n. A
node matched to the special node
indicates that it was
inserted, and a node matched to
indicates that it was
deleted. Note that this matching between nodes need not be
one-to-one; a node may be matched to more than one other nodes. (For
example, referring to Figure 2 node 7 may be
matched to both node 52 and node 61.) The only restriction is that a
node be matched to at least one other node. Thus, finding the
correspondence between the nodes of two trees consists essentially of
finding an edge cover
of their induced graph.
The induced graph has a large number of edge covers (this number being
exponential in the number of nodes). However, we may intuitively
observe that most of these possible edge covers of B are
undesirable. For example, and edge cover that maps all nodes in
to
, and all nodes in
to
seems like a bad
choice, since it corresponds to deleting all the nodes of
and
then inserting all the nodes of
. We will define the
correspondence between an edge cover of an induced graph and an edit
script for the underlying trees formally in
Section 4, where we also describe how to
compute an edit script corresponding to an edge cover. For now, we
simply note that, given an edge cover of the induced graph, we can
compute a corresponding edit script for the underlying trees. Hence,
we would like to select an edge cover of the induced graph that
corresponds to a minimum-cost edit script.