Exploring Web Scale Language models for Search Query Processing

It has been widely observed that search queries are composed

in a very different style from that of the body

or the title of a document. Many techniques explicitly

accounting for this language style discrepancy have shown

promising results for information retrieval, yet a large scale

analysis on the extent of the language differences has been

lacking. In this paper, we present an extensive study on

this issue by examining the language model properties of

search queries and the three text streams associated with

each web document: the body, the title, and the anchor

text. Our information theoretical analysis shows that queries

seem to be composed in a way most similar to how authors

summarize documents in anchor texts or titles, offering a

quantitative explanation to the observations in past work.

We apply these web scale n-gram language models to

three search query processing (SQP) tasks: query spelling

correction, query bracketing and long query segmentation.

By controlling the size and the order of different language

models, we find that the perplexity metric to be a good

accuracy indicator for these query processing tasks. We

show that using smoothed language models yields significant

accuracy gains for query bracketing for instance, compared

to using web counts as in the literature. We also demonstrate

that applying web-scale language models can have

marked accuracy advantage over smaller ones.

wfp0419-Huang-final.pdf
PDF file

In  Proceedings of the 19th International World Wide Web Conference (WWW’2010), Raleigh, NC

Details

TypeInproceedings
> Publications > Exploring Web Scale Language models for Search Query Processing