A Machine Learning Approach for Improved BM25 Retrieval

BM25 is one of the most widely used information retrieval functions because of its consistently high retrieval accuracy. Despite its widespread use, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness when the document description is over single or multiple fields. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures.

LearningBM25MSRTechReport.pdf
PDF file

Publisher  Microsoft
© 2008 Microsoft Corporation. All rights reserved.

Details

TypeTechReport
NumberMSR-TR-2009-92
Pages25
InstitutionMicrosoft Research
> Publications > A Machine Learning Approach for Improved BM25 Retrieval