M. A. Cheema, Xuemin Lin, Haixun Wang, Jianmin Wang, and Wenjie Zhang
Top-k pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the top-k pairs queries that rank the pairs on distance functions. While these queries have received significant research attention, there does not exist a unified approach that can efficiently answer all these queries. Moreover, there is no existing work that supports top-k pairs queries based on generic scoring functions. In this paper, we present a unified approach that supports a broad class of top-k pairs queries including the queries mentioned above. Our proposed approach allows the users to define a local scoring function for each attribute involved in the query and a global scoring function that computes the final score of each pair by combining its scores on different attributes. We propose efficient internal and external memory algorithms and our theoretical analysis shows that the expected performance of the algorithms is optimal when two or less attributes are involved. Our approach does not require any pre-built indexes, is easy to implement and has low memory requirement. We conduct extensive experiments to demonstrate the efficiency of our proposed approach.
|Published in||the 26th International Conference on Data Engineering (ICDE)|