Building bridges for web query classification

Dou Shen; Jian-Tao Sun; Qiang Yang; Zheng Chen

Building bridges for web query classification

Dou Shen ,
Jian-Tao Sun ,
Qiang Yang ,
Zheng Chen

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval | January 2006

Published by ACM

Publication

Download BibTex

Web query classification (QC) aims to classify Web users’ queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7% and 3.8% in terms of precision and F1 respectively compared with the best results of KDDCUP 2005.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.SIGIR'06, August 6-11, 2006, Seattle, Washington, USA.Copyright 2006 ACM 1-59593-369-7/06/0008 ...$5.00.