Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Adam Bosworth, Jim Gray, Andrew Layman, and Hamid Pirahesh

February 1995

Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or 1-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube, or simply cube. The cube operator generalizes the histogram, cross-tabulation, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points form an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value", ALL. For example, the point would represent the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

Word document | PDF file |

Publisher Institute of Electrical and Electronics Engineers, Inc.

© 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Type | TechReport |

URL | http://www.ieee.org/ |

Number | MSR-TR-95-22 |

Pages | 21 |

Institution | Microsoft Research |

> Publications > Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals