Graph-Based Synopses for Relational Selectivity Estimation Neoklis Polyzotis, UC Santa Cruz Accurate data synopses are crucial for relational query optimization, as they enable estimates on the cardinality of query expressions and effectively on the cost factors of candidate execution plans. Prior work in this area has focused mainly on single-table summaries (e.g., histograms, wavelets, or samples) that are not appropriate for complex join queries, or on schema-level summaries (e.g., join synopses, or probabilistic relational models) that can handle complex queries well but are applicable only to specific classes of relational schemata. In this talk, I will present a new class of schema-level synopses, termed Tuple Graphs (TuGs), that target the important problem of selectivity estimation for complex join queries. In contrast to previous approaches, TuGs are applicable to schemata with join cycles and many-to-many relationships, thus covering a large class of data sets that are common in real-world applications. The proposed TuG model is based on a graph-based summarization framework that promotes join relationships to a first-class citizen of the data instance, essentially adopting a "semi-structured" view of the underlying relational data. I will describe the TuG synopsis model and its properties, and present an efficient and scalable construction algorithm for building accurate summaries within a given storage budget. The talk will conclude with the results of an extensive experimental study that verifies the effectiveness of TuGs as accurate data synopses, and demonstrates their benefits over existing summarization techniques.