Jun Xu, Hang Li, Tie-Yan Liu, Yisha Peng, Min Lu, and Wei-Ying Ma
One of the central issues in learning to rank for Information Retrieval (IR) is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval, such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). In this paper, we aim to conduct a comprehensive study on the approach of directly optimizing evaluation measures in learning to rank for IR. We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study, which is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we make theoretical analysis the two types of upper bounds and show that we can derive new algorithms on the basis of this analysis and present two new algorithms called AdaRank and PermuRank. We make comparisons between direct optimization methods of AdaRank, PermuRank, and SVMmap, using benchmark datasets. We also compare them with conventional methods of Ranking SVM and RankBoost. Experimental results show that the methods based on direct optimization of ranking measures can always outperform these conventional methods. However, no significant difference exists among the performances of the direct optimization methods themselves.