Learning to Rank with SoftRank and Gaussian Processes

John Guiver and Edward Snelson


In this paper we address the issue of learning to rank for document retrieval using Thurstonian models based on sparse Gaussian processes. Thurstonian models represent each document for a given query as a probability distribution in a score space; these distributions over scores naturally give rise to distributions over document rankings. However, in general we do not have observed rankings with which to train the model; instead, each document in the training set is judged to have a particular relevance level: for example "Bad", "Fair", "Good", or "Excellent". The performance of the model is then evaluated using information retrieval (IR) metrics such as Normalised Discounted Cumulative Gain (NDCG). Recently Taylor et al. presented a method called SoftRank which allows the direct gradient optimisation of a smoothed version of NDCG using a Thurstonian model. In this approach, document scores are represented by the outputs of a neural network, and score distributions are created artificially by adding random noise to the scores. The SoftRank mechanism is a general one; it can be applied to different IR metrics, and make use of different underlying models. In this paper we extend the SoftRank framework to make use of the score uncertainties which are naturally provided by a Gaussian process (GP), which is a probabilistic non-linear regression model. We further develop the model by using sparse Gaussian process techniques, which give improved performance and efficiency, and show competitive results against baseline methods when tested on the publicly available LETOR OHSUMED data set. We also explore how the available uncertainty information can be used in prediction and how it affects model performance.


Publication typeInproceedings
Published inSIGIR'08
PublisherAssociation for Computing Machinery, Inc.
> Publications > Learning to Rank with SoftRank and Gaussian Processes