Zhicheng Dou, Sha Hu, Kun Chen, Ruihua Song, and Ji-Rong Wen
Most existing search result diversification algorithms diversify search results in terms of a specific dimension. In this paper, we argue that search results should be diversified in a multi-dimensional way, as queries are usually ambiguous at different levels and dimensions. We first explore mining subtopics from four types of data sources, including anchor texts, query logs, search result clusters, and web sites. Then we propose a general framework that explicitly diversifies search results based on multiple dimensions of subtopics. It balances the relevance of documents with respect to the query and the novelty of documents by measuring the coverage of subtopics. Experimental results on the TREC 2009
Web track dataset indicate that combining multiple types of subtopics do help better understand user intents. By incorporating multiple types of subtopics, our models improve the diversity of search results over the sole use of one of them, and outperform two state-of-the-art models.
In Proceedings of WSDM'11
Publisher Association for Computing Machinery, Inc.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.