Consistent Phrase Relevance Measures

  • Scott Wen-tau Yih ,
  • Chris Meek

Proceedings of The 2nd Annual International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD-08 Workshop) |

Publication

Measuring the relevance between a document and a phrase is fundamental to many information retrieval and matching tasks including on-line advertising. In this paper, we explore two approaches for measuring the relevance between a document and a phrase aiming to provide consistent relevance scores for both in and out-of document phrases. The first approach is a similarity-based method which represents both the document and phrase as term vectors to derive a real-valued relevance score. The second approach takes as input the relevance estimates of some in-document phrases and uses Gaussian Process Regression to predict the score of a target out-of-document phrase. While both of these two approaches work well, the best result is given by a Gaussian Process Regression model, which is significantly better than the similarity-based approach and 10% better than a baseline similarity method using bag-of-word vectors.