Isn't a WSMS the same as ...?

Isn't this the same as ...?

At first glance, the notions of a WSMS and query optimization over web services might seem similar to what a lot of existing work in the database literature has already addressed. Here, we briefly explain what makes the web-services problem unique, and the differences from the major pieces of related work. For details, see our paper.

Mediators / Data Integration Systems

A WSMS is just like a mediator over web services. However, the challenges that arise in query optimization are quite different. Traditional mediators integrate passive sources of data rather than active processing elements such as web services. Consequently, query optimization techniques for mediators focus mostly on issues such as extracting statistics from these data sources, or choosing the right binding pattern to access each data source. Beyond these issues, only classical relational database optimization techniques are applied to optimize queries at the mediator. In contrast, in the context of query processing over web services, the bottleneck is not the processing at the WSMS itself, but the cost of making the expensive web service calls. Since web services are active processing elements, query plans can benefit greatly by exploiting parallelism among them. Consequently, our optimization techniques focus on scheduling the web service calls and deciding the order in which to query them so that the parallelism among them can be exploited fully. To the best of our knowledge, exploiting parallelism among data sources has not been the focus of prior work.

Parallel / Distributed Query Optimization

The problem of query optimization over web services could be considered a special case of the more general problems of parallel and distributed query optimization, each of which have been addressed extensively in the database literature. However, the key difference is that in a parallel / distributed database setting, one can get every node to process any desired data in any desired way, while in the web services setting, the functionality of a web service is already preset; we can merely choose what data to send it for processing. This limitation results in a considerably smaller search space for query plans in the web services scenario as compared to full-blown distributed or parallel query processing. Consequently, we are able to guarantee optimal plans while most of parallel / distributed query optimization is limited to heuristics. We are not aware of any work in the context of parallel or distributed query optimization that, when applied to the web services scenario, produces the optimal results.

Query Optimization with Expensive Predicates

Each web service could be considered as an expensive predicate and a query over web services as a classical relational database query with expensive predicates (which can be optimized with known techniques). However, the execution model is substantially different: in a classical relational database, each predicate would be executed on the same machine, while in the web services context, each web service executes independently using its own resources. This leads to a fundamentally different cost model, and consequently an entirely different set of optimization techniques.

WSQ/DSQ

WSQ/DSQ is a much earlier system, also from Stanford, that is based on an observation similar to what our optimization techniques are based on: that queries over web sources (in their case search engines) can be considerably sped up by making several calls in parallel. However WSQ/DSQ does not perform any cost-based optimization.

Mediators / Data Integration Systems

Parallel / Distributed Query Optimization

Query Optimization with Expensive Predicates

WSQ/DSQ