LETOR is a package of benchmark data sets for LEarning TO Rank, released by
Microsoft Research Asia.
Ranking is the central problem for many applications, and using machine
learning technologies to learn the ranking function has been a promising
research direction. However, the lack of public benchmark datasets (e.g.
standard features, relevance judgments, data partitioning, and evaluation
metrics) makes the existing work difficult to be compared with each other.
To solve this problem, in LETOR version 1.0, we extracted features for each
query-document pair in the OHSUMED and TREC collections (which are widely
used in the literature of information retrieval (IR)). Our extracted
features cover most of the 'standard' features in IR, including classical
features (such as term frequency, inverse document frequency, BM25 and
language models for IR), and the features proposed in SIGIR papers these
years (such as HostRank, Feature propagation and Topical PageRank). Note
that from these features, one cannot reconstruct the original documents in
the OHSUMED and TREC collections. We benchmarked several state-of-the-arts
ranking models with these features and provide baseline results for future
studies. We also released an evaluation tool which can compute precision
(P@n and MAP) and normalized discount cumulative gain (NDCG), hoping that by
using this single tool, the experimental results of different methods can be
easily and impartially compared.
Please check this
website of LETOR for more details.
Please cite the following paper when you use our LETOR dataset in your
research:
Tie-Yan Liu, Tao Qin, Jun Xu, Wenying Xiong and Hang Li, LETOR: Benchmark
dataset for research on learning to rank for information retrieval, LR4IR 2007, in conjunction with SIGIR 2007.