Zhicong Cheng, Bin Gao, and Tie-Yan Liu
26 April 2010
This paper is concerned with actively predicting search intent from user browsing behavior data. In recent years, great attention has been paid to predicting user search intent. However, the prediction was mostly passive because it was performed only after users submitted their queries to search engines. It is not considered why users issued these queries, and what triggered their information needs. According to our study, many information needs of users were actually triggered by what they have browsed. That is, after reading a page, if a user found something interesting or unclear, he/she might have the intent to obtain further information and accordingly formulate a search query. Actively predicting such search intent can benefit both search engines and their users. In this paper, we propose a series of technologies to fulfill this task. First, we extract all the queries that users issued after reading a given page from user browsing behavior data. Second, we learn a model to effectively rank these queries according to their likelihoods of being triggered by the page. Third, since search intents can be quite diverse even if triggered by the same page, we propose an optimization algorithm to diversify the ranked list of queries obtained in the second step, and then suggest the list to users. We have tested our approach on large-scale user browsing behavior data obtained from a commercial search engine. The experimental results have shown that our approach can predict meaningful queries for a given page, and the search performance for these queries can be significantly improved by using the triggering page as contextual information.
|Published in||Proceedings of the 19th international conference on World wide web|
|Publisher||Association for Computing Machinery, Inc.|
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or email@example.com. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.