LETOR is a package of benchmark data sets for research on LEarning TO Rank. LETOR version 1.0 contains two data sets: OHSUMED, and TREC (TD2003 and TD2004). It also provides basic documents, evaluation tools, and baseline evaluation results.
|
Download Details
|
Note: By installing, copying, or otherwise using this software, you agree to be bound by the terms of its license. |
LETOR is a package of benchmark data sets for LEarning TORank, released by Microsoft Research Asia.
Ranking is the central problem for many applications, and using machine learning technologies to learn the ranking function has been a promising research direction. However, the lack of public benchmark datasets (e.g. standard features, relevance judgments, data partitioning, and evaluation metrics) makes the existing work difficult to be compared with each other.
To solve this problem, in LETOR version 1.0, we extracted features for each query-document pair in the OHSUMED and TREC collections (which are widely used in the literature of information retrieval (IR)). Our extracted features cover most of the 'standard' features in IR, including classical features (such as term frequency, inverse document frequency, BM25 and language models for IR), and the features proposed in SIGIR papers these years (such as HostRank, Feature propagation and Topical PageRank). Note that from these features, one cannot reconstruct the original documents in the OHSUMED and TREC collections. We benchmarked several state-of-the-arts ranking models with these features and provide baseline results for future studies. We also released an evaluation tool which can compute precision (P@n and MAP) and normalized discount cumulative gain (NDCG), hoping that by using this single tool, the experimental results of different methods can be easily and impartially compared.
Note: since this dataset is still a beta and we are trying to make it more reliable and usable by incorporating user feedback, there might be some minor patches released at our website of LETOR (http://research.microsoft.com/en-us/um/beijing/projects/letor/), and there may be major upgrades in MSR download site after a period. You may want to send email to tyliu@microsoft.com for more information.



