Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
A Machine Learning Approach for Improved BM25 Retrieval

Krysta M. Svore and Christopher J.C. Burges

Abstract

BM25 is one of the most widely used information retrieval functions because of its consistently high retrieval accuracy. Despite its widespread use, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness when the document description is over single or multiple fields. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures.

Details

Publication typeTechReport
NumberMSR-TR-2009-92
Pages25
InstitutionMicrosoft Research
PublisherMicrosoft
> Publications > A Machine Learning Approach for Improved BM25 Retrieval