Fast Set Intersection in Memory

Bolin Ding and Arnd Christian König

2011

Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / sqrt(w) + kr), where r is the intersection size and w is the number of bits in a machine-word. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads.

Publication type | Article |

Published in | Proceedings of the VLDB Endowment, the 37th International Conference on Very Large Data Bases (VLDB 2011) |

Pages | 255-266 |

Volume | 4 |

Number | 4 |

Publisher | Very Large Data Bases Endowment Inc. |

- Discovering Queries based on Example Tuples
- Entity Categorization over Large Document Collections
- Text Cube: Computing IR Measures for Multidimensional Text Database Analysis

> Publications > Fast Set Intersection in Memory