Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, and Navin Kumar Singh
July 2011
Web search queries have evolved into a language of their own. In this paper, we substantiate this fact through the analysis of complex networks constructed from query logs. Like natural language, a two-regime degree distribution in word or phrase co-occurrence networks of queries reveals the existence of a small kernel and a very large periphery. But unlike natural language, where a large fraction of sentences are formed only using the kernel words, most queries consist of units both from the kernel and the periphery. The long mean shortest path for these networks further show that paths between peripheral units are typically connected through nodes in the kernel, which in turn are connected through multiple hops within the kernel. The extremely large periphery implies that the likelihood of encountering a new word or segment is much higher for queries than in natural language, making the processing of unseen queries
much harder than that of unseen sentences.
In Proceedings of the 2nd International ACM SIGIR (Association for Computing Machinery Special Interest Group on Information Retrieval) Workshop on Query Representation and Understanding 2011 (QRU 2011)
Publisher Association for Computing Machinery, Inc.
Copyright @ ACM 2011
| Type | Inproceedings |
| URL | http://ciir.cs.umass.edu/sigir2011/qru/roy+al.pdf |
| Pages | 5-8 |