Graph-Based Synopses for Relational Selectivity Estimation
                    Neoklis Polyzotis, UC Santa Cruz

Accurate data synopses are crucial for relational query optimization,
as they enable estimates on the cardinality of query expressions and
effectively on the cost factors of candidate execution plans. Prior
work in this area has focused mainly on single-table summaries (e.g.,
histograms, wavelets, or samples) that are not appropriate for complex
join queries, or on schema-level summaries (e.g., join synopses, or
probabilistic relational models) that can handle complex queries well
but are applicable only to specific classes of relational schemata.

In this talk, I will present a new class of schema-level synopses,
termed Tuple Graphs (TuGs), that target the important problem of
selectivity estimation for complex join queries. In contrast to
previous approaches, TuGs are applicable to schemata with join cycles
and many-to-many relationships, thus covering a large class of data
sets that are common in real-world applications. The proposed TuG
model is based on a graph-based summarization framework that promotes
join relationships to a first-class citizen of the data instance,
essentially adopting a "semi-structured" view of the underlying
relational data. I will describe the TuG synopsis model and its
properties, and present an efficient and scalable construction
algorithm for building accurate summaries within a given storage
budget. The talk will conclude with the results of an extensive
experimental study that verifies the effectiveness of TuGs as accurate
data synopses, and demonstrates their benefits over existing
summarization techniques.