Yang Song and Li-wei He
Query suggestion has been an effective approach to help users narrow down to the information they need. However, most of existing studies focused on only popular/head queries. Since rare queries possess much less information (e.g., clicks) than popular queries in the query logs, it is much more difficult to efficiently suggest relevant queries to a rare query. In this paper, we propose an optimal rare query suggestion framework by leveraging implicit feedbacks from users in the query logs. Our model resembles the principle of pseudo-relevance feedback which assumes that top-returned results by search engines are relevant. However, we argue that the clicked URLs and skipped URLs contain different levels of information and thus should be treated differently. Hence, our framework optimally combines both the click and skip information from users and uses a random walk model to optimize the query correlation. Our model specifically optimizes two parameters: (1) the restarting (jumping) rate of random walk, and (2) the combination ratio of click and skip information. Unlike the Rocchio algorithm, our learning process does not involve the content of the URLs but simply leverages the click and skip counts in the query-URL bipartite graphs. Consequently, our model is capable of scaling up to the need of commercial search engines. Experimental results on one-month query logs from a large commercial search engine with over 40 million rare queries demonstrate the superiority of our framework, with statistical significance, over the traditional random walk models and pseudo-relevance feedback models.
In WWW '10 Proceedings of the 19th international conference on World wide web
Publisher Association for Computing Machinery, Inc.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or email@example.com. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.