Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, and Dong Xin
In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is to find such related objects that best match a set of keywords. The keywords may not necessarily occur in the textual descriptions of target objects; they occur only in the documents. In order to answer these queries, we exploit the relationships between the documents containing the keywords and the target objects related to those documents. Current keyword query paradigms do not use these relationships effectively and hence are inefficient for these queries. In this paper, we consider a class of queries called the “object finder” queries. Our goal is to return the top K objects that best match a given set of keywords by exploiting the relationships between documents and objects. We design efficient algorithms by developing early termination strategies in presence of blocking operators such as group by. Our experiments with real datasets and workloads demonstrate the effectiveness of our techniques. Although we present our techniques in the context of keyword search, our techniques apply to other types of ranked searches (e.g., multimedia search) as well.
|Published in||SIGMOD Conference|