Direct Optimization of Evaluation Measures in Learning to Rank

  • Jun Xu ,
  • Hang Li ,
  • ,
  • Yisha Peng ,
  • Min Lu ,
  • Wei-Ying Ma

MSR-TR-2010-171 |

One of the central issues in learning to rank for Information Retrieval (IR) is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval, such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). In this paper, we aim to conduct a comprehensive study on the approach of directly optimizing evaluation measures in learning to rank for IR. We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study, which is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we make theoretical analysis the two types of upper bounds and show that we can derive new algorithms on the basis of this analysis and present two new algorithms called AdaRank and PermuRank. We make comparisons between direct optimization methods of AdaRank, PermuRank, and SVMmap, using benchmark datasets. We also compare them with conventional methods of Ranking SVM and RankBoost. Experimental results show that the methods based on direct optimization of ranking measures can always outperform these conventional methods. However, no significant difference exists among the performances of the direct optimization methods themselves.

Publication Downloads

AdaRank

April 11, 2011

The basic idea of AdaRank is constructing “weak rankers” repeatedly based on reweighted training queries and linearly combining the weak rankers for making ranking predictions. In learning, AdaRank minimizes a loss function directly defined on performance measures. The details of AdaRank can be found in the paper “AdaRank: A Boosting Algorithm for Information Retrieval.”.