Latent Additivity: Combining Homogeneous Evidence

  • Shuming Shi ,
  • Ruihua Song ,
  • Ji-Rong Wen

MSR-TR-2006-110 |

The relevance ranking problem in information retrieval and Web search is basically the task of computing aggregated scores from potentially large amounts of evidence. This paper focuses on computing an aggregated score for a homogeneous-evidence-set (HES), an evidence collection in which all evidence items are symmetric. Since the evidence items in an HES are typically highly dependent on one another, and the numbers of evidence items may vary from document to document, many existing techniques fail to properly deal with the problem. In this paper, we propose a simple, intuitive, and efficient approach for homogeneous evidence score combination. Our proposed approach can be derived in two different ways by utilizing two separate information retrieval models: The first way is to extend the BM25 formula by making a latent additivity assumption. The second is to adopt the recently proposed gravitational information retrieval model. The proposed approach could be seen as a generalization of some existing score combination formulas by considering the dependency between evidence items. We have tested our approach on both Text Retrieval Conference (TREC) collections and a dataset collected by a large scale commercial Web search engine. This approach could be a practical choice for homogeneous evidence combination, and act as a replacement for some of the existing heuristic formulas.