DataCube: A Way to Look at Multi-Dimensional Data

Dr. Jim Gray

Microsoft Research


Many applications collect item data with many attributes. This ranges from geophysical data indexed by (time, place), sales data indexed by (time, product, buyer, seller), production data indexed by (time, process, product), and so on. Analysts frequently want to aggregate this data and view it from many different dimensions looking for patterns and exceptions. You commonly see this as Excel pivot tables. This talk explains a visual N-dimensional metaphor for this multi-dimensional data. It then discusses how this visual model can be mapped into the "flat" 2D relational model. It turns out that rollup, drill-down, and histogram is a generalization of the SQL GROUP BY operator. More interesting is the extension mechanism that allows SQL to get new aggregate operators (e.g. running average,
median, ...). Efficiently evaluating these operators leads to an interesting taxonomy of aggregate functions and leads to many interesting query optimization problems.

This work was done jointly with Adam Bossworth, Andrew Layman, and Hamid Pirahesh.


Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Corss-Tab, and Sub-Totals. 21 pages, February 1995, Revised July 1995.