Maximizing Coverage of Mediated Web Queries

Ramana Yerneni, Felix Naumann, Hector Garcia-Molina

Abstract

Over the Web, mediators are built on large collections of sources to provide integrated access to Web content (e.g., meta-search engines). In order to minimize the expense of visiting a large number of sources, mediators need to choose a subset of sources to contact when processing queries. As fewer sources participate in processing a mediated query, the coverage of the query goes down. In this paper, we study this trade-off and develop techniques for mediators to maximize the coverage for their queries while at that same time visiting a subset of their sources. We formalize the problem; study its complexity; propose algorithms to solve it; and analyze the theoretical performance guarantees of the algorithms. We also study the performance of our algorithms through simulation experiments.