Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Publications > A Machine Learning Approach for Improved BM25 Retrieval
A Machine Learning Approach for Improved BM25 Retrieval

BM25 is one of the most widely used information retrieval functions

because of its consistently high retrieval accuracy.

Despite its widespread use, there have been few studies examining its effectiveness on a

document description over single and multiple field combinations.

We determine the effectiveness of BM25 on various document fields.

We find that BM25 models relevance on popularity fields

such as anchor text and query click information no better than a linear function of the field attributes.

We also find query click information to be the single most important field for retrieval.

In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25.

Our model significantly improves retrieval effectiveness when the document description is over single or multiple fields.

Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures.

LearningBM25MSRTechReport.pdf
PDF file

Publisher: Microsoft
© 2008 Microsoft Corporation. All rights reserved.

Details

Type: TechReport
Number: MSR-TR-2009-92
Pages: 25
Institution: Microsoft Research