Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or 1-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube, or simply cube. The cube operator generalizes the histogram, cross-tabulation, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points form an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value", ALL. For example, the point would represent the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

tr-95-22.doc
Word document
tr-95-22.pdf
PDF file

Publisher  Institute of Electrical and Electronics Engineers, Inc.
© 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Details

TypeTechReport
URLhttp://www.ieee.org/
NumberMSR-TR-95-22
Pages21
InstitutionMicrosoft Research
> Publications > Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals