Optimizing Large Join Queries in Mediation Systems

	Ramana Yerneni, Chen Li, Jeffrey Ullman, Hector Garcia-Molina
	yerneni, chenli, ullman, hectorg@cs.stanford.edu
		Stanford University, USA

		Abstract

 In data integration systems, queries posed to a mediator need to be
 translated into a sequence of queries to the underlying data sources. In
 a heterogeneous environment, with sources of diverse and limited query
 capabilities, not all the translations are feasible. In this paper, we
 study the problem of finding feasible and efficient query plans for
 mediator systems. We consider conjunctive queries on mediators and model
 the source capabilities through attribute-binding adornments. We use a
 simple cost model that focuses on the major costs in mediation systems,
 those involved with sending queries to sources and getting answers
 back. Under this metric, we develop two algorithms for source query
 sequencing - one based on a simple greedy strategy and another based on a
 partitioning scheme. The first algorithm produces optimal plans in some
 scenarios, and we show a linear bound on its worst case performance when
 it misses optimal plans. The second algorithm generates optimal plans in
 more scenarios, while having no bound on the margin by which it misses
 the optimal plans. We also report on the results of the experiments that
 study the performance of the two algorithms.