Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data
Chris Olston and Jennifer Widom
Abstract
Strict consistency of replicated data is infeasible or not required by
many distributed applications, so current systems often permit
stale replication, in which cached copies of data values are
allowed to become out of date. Queries over cached data return an
answer quickly, but the stale answer may be unboundedly imprecise.
Alternatively, queries over remote master data return a precise
answer, but with potentially poor performance. To bridge the gap
between these two extremes, we propose a new class of replication
systems called TRAPP (Tradeoff in Replication Precision and
Performance). TRAPP systems give each user fine-grained control
over the tradeoff between precision and performance: Caches store
ranges that are guaranteed to bound the current data values, instead
of storing stale exact values. Users supply a quantitative
precision constraint along with each query. To answer a query,
TRAPP systems automatically select a combination of locally cached
bounds and exact master data stored remotely to deliver a bounded
answer consisting of a range that is no wider than the specified
precision constraint, that is guaranteed to contain the precise
answer, and that is computed as quickly as possible. This paper
defines the architecture of TRAPP replication systems and covers some
mechanics of caching data ranges. It then focuses on queries with
aggregation, presenting optimization algorithms for answering queries
with precision constraints, and reporting on performance experiments
that demonstrate the fine-grained control of the precision-performance
tradeoff offered by TRAPP systems.
Conference Paper (VLDB 2000): [PS], [PDF]. Citation: [BibTeX]
Extended Version: [PS], [PDF]
TRAPP Project Web Page: [HTML]