Improving Unsupervised Query Segmentation using Parts-of-Speech Sequence Information

  • Rishiraj Saha Roy ,
  • Yogarshi Vyas ,
  • Niloy Ganguly ,
  • Monojit Choudhury

Proceedings of the 37th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR '14) |

Published by ACM - Association for Computing Machinery

Publication

We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different POS tagsets and an unsupervised POS induction technique specifically adapted for queries show that POS information can significantly improve query segmentation performance in all these cases.