Mediated Query Processing Over Autonomous Data Sources

Ramana Yerneni

Advisor: Hector Garcia-Molina

Abstract

When processing queries over autonomous data sources, users face many challenges. We focus our attention on the problems encountered due to the limited and diverse query-processing capabilities of data sources and the need to integrate data from a large number of sources in order to answer user queries. Mediators are developed to alleviate these problems, as they provide extended query-processing capabilities and integrated views across data sources. In this thesis, we discuss the techniques we have developed to enable mediators to overcome the challenges of processing queries over large sets of autonomous sources that have limited and diverse capabilities.

We develop languages to describe query-processing capabilities. We present algorithms used by mediators to support powerful query interfaces to data sources by translating the user queries into a sequence of simpler sources queries and postprocessing operations at the mediators. We discuss the complexity of the problem of query planning at the mediators when processing queries over large join views and develop good optimization algorithms. We identify a class of queries that often occur in contexts involving autonomous data sources, namely {\em fusion queries}. Conventional query-optimization techniques do not scale well to process fusion queries, and hence we develop new techniques for fusion-query optimization.

We consider the framework of a network of mediators working together to process user queries and in this context we present algorithms to compute the query capabilities of mediators based on the capabilities of data sources and other mediators they rely on. We also discuss ways in which the query capabilities of data sources and mediators can be expressed concisely to facilitate a simpler user experience and more efficient query processing.