My thesis work focuses on mapping queries and data across disparate contexts. Such translation is critical especially since the Internet has brought together information sources worldwide. In particular, the mapping is essential for many important applications such as meta-searching (querying many sources by a mediator), e-commerce (e.g., comparison shopping), and web mining (by querying sources and analyzing data).
Exact Query Translation: First, I have developed a query translation mechanism for exact query processing [7]. The mechanism will find the minimal-superset mappings that do not miss any answers (i.e., no false-negatives) and that incur as few extra ones as possible (i.e., minimal false-positives). To obtain the exact query results, I also designed the derivation of filter queries for removing false-positives [5]. The algorithms guarantee that the translated queries minimally subsume the original ones and that the filter queries are of minimal cost. Furthermore, since the translation machinery relies on separately-supplied rules for rewriting basic query constraints, I also developed algorithms for rewriting IR predicates commonly used for document retrieval [2], and evaluated the post-filtering cost for such rewritings [11].
Data Translation: Second, I have adopted this query mapping framework for data translation by developing the modeling of data as a set of conjunctive constraints [8]. The machinery can deal with flat data (i.e., attribute-value pairs as shown above) as well as hierarchically structured information such as XML.
Approximate
Query Translation: Third, I have further
developed general
approximate query translation
that finds the closet mappings under virtually
any
closeness criteria, such as minimal-superset, maximal-subset,
or some hybrid scheme that combines both precisionand
recall [6].
I defined a customizable notion of
closeness and designed a general translation machinery. As the basis,
I have studied fundamental theorems on the constraint dependencies
and the separabilityacross query conjuncts
and disjuncts. The results are essential for any algorithm that attempts
approximate query translation.
see our online demo.