Yahoo! Learning to Rank Challenge Datasets
Yahoo! Labs organizes a learning to rank challenge in March 2010. Two large scale datasets are released. The challenge consists of two tracks: a standard learning to rank track as well as a transfer learning one. It is open to all research groups in academia and industry.
The datasets come from web search ranking and are of a subset of what Yahoo! uses to train its ranking function. They consist of features vectors extracted from query-urls pairs along with relevance judgments. The relevance judgments can take 5 different values from 0 (irrelevant) to 4 (perfectly relevant). The queries, urls and features descriptions are not disclosed, only the feature values. There are two datasets for this challenge, each corresponding to a different country: a large one (labeled set1) and a small one (labeled set2). Both datasets are related, but also different to some extent. Each dataset is divided into 3 sets: training, validation, and test.
The statistics for the various sets are as follows:
|Set 1||Set 2|
There are 700 features in total. Some of them are defined in set1 or set2 only, while some others are defined in both sets. When a feature is undefined for a set, its value is 0. All the features have been normalized to be in the [0,1] range. p>
More details can be found at Yahoo! Learning to Rank Challenge.