Mingjie Zhu, Shuming Shi, Zaiqing Nie, and Ji-Rong Wen
A typical scenario in information retrieval and web search is to index a given type of items (e.g., web pages, images) and provide search functionality for them. In such a scenario, the basic units of indexing and retrieval are the same. Extensive study has been done for efficient top-k computation in such settings. This paper studies top-k processing for many emerging scenarios: efficiently retrieving top-k items of one type based on the inverted index of another type of items. It would be very inefficient by directly utilizing traditional top-k approaches. Here we follow TA (the Threshold Algorithm) in this scenario. We present an aggregation-aware top-k computation framework with three pruning principles upon the conventional inverted index and a novel inverted index type HybridRank, which employs the item information of both types. Experimental results show that our proposed new index structure and the aggregation-aware top-k strategy provide an efficient solution for this aggregation-aware top-k problem.