ABSTRACT
Many applications collect item data with many attributes. This ranges from
geophysical data indexed by (time, place), sales data indexed by (time,
product, buyer, seller), production data indexed by (time, process, product),
and so on. Analysts frequently want to aggregate this data and view it from
many different dimensions looking for patterns and exceptions. You
commonly see this as Excel pivot tables. This talk explains a visual
N-dimensional metaphor for this multi-dimensional data. It then discusses
how this visual model can be mapped into the "flat" 2D relational model.
It turns out that rollup, drill-down, and histogram is a generalization
of the SQL GROUP BY operator. More interesting is the extension mechanism
that allows SQL to get new aggregate operators (e.g. running average,
median, ...).
Efficiently evaluating these operators leads to an interesting
taxonomy of aggregate functions and leads to many interesting query
optimization problems.
This work was done jointly with Adam Bossworth, Andrew Layman, and Hamid Pirahesh.