Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Unsupervised query segmentation using only query logs

Nikita Mishra, Rishiraj Saha Roy, Niloy Ganguly, Srivatsan Laxman, and Monojit Choudhury


We introduce an unsupervised query segmentation scheme that uses query logs as the only resource and can effectively capture the structural units in queries. We believe that Web search queries have a unique syntactic structure which is distinct from that of English or a bag-of-words model. The segments discovered by our scheme help understand this underlying grammatical structure. We apply a statistical model based on Hoeffding’s Inequality to mine significant word n-grams from queries and subsequently use them for segmenting the queries. Evaluation against manually segmented queries shows that this technique can detect rare units that are missed by our Pointwise Mutual Information (PMI) baseline.


Publication typeInproceedings
Published inProceedings of the Twentieth International World Wide Web Conference (WWW 2011), Companion Volume, Hyderabad, Mar 28-Apr 1

Newer versions

Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, and Monojit Choudhury. Improving Document Ranking for Long Queries with Nested Query Segmentation, ECIR, March 2016.

> Publications > Unsupervised query segmentation using only query logs