How good is a span of terms? Exploiting Proximity to Improve Web Retrieval

Proceedings of SIGIR |

Published by Association for Computing Machinery, Inc.

Ranking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking features based on flexible proximity terms with recent state-of-the-art machine learning ranking models. We introduce a method of determining the goodness of a set of proximity terms that takes advantage of the structured nature of web documents, document metadata, and phrasal information from search engine user query logs. We perform experiments on a large real-world Web data collection and show that using the goodness score of flexible proximity terms can improve ranking accuracy over state-of-the-art ranking methods by as much as 13%. We also show that we can improve accuracy on the hardest queries by as much as 9% relative to state-of-the-art approaches.