LETOR4.0 Downloads
To use the datasets, you must read and accept the online agreement. By using the datasets, you agree to be bound by the terms of its license.
Datasets
Note that the two semi-supervised ranking datasets have been updated on Jan. 7, 2010. Please download the new version if you are using the old ones.| Setting | Datasets | Size |
| Supervised ranking | MQ2007 | ~ 65M |
| MQ2008 | ~ 15M | |
| Semi-supervised ranking | MQ2007-semi | ~ 940M |
| MQ2008-semi | ~ 650M | |
| Rank aggregation | MQ2007-agg | ~ 20M |
| MQ2008-agg | ~ 4M | |
| Listwise ranking | MQ2007-list | ~ 950M |
| MQ2008-list | ~ 670M |
Evaluation tools
The evaluation scripts for LETOR4.0 are a little different from those for LETOR3.0.Please do not use the tools across LETOR3.0 and LETOR4.0.
Evaluation script for supervised ranking, semi-supervised ranking and rank aggregation
Evaluation script for listwise ranking
Significance test script for all the four settings
Possible issues
If you are using a linux machine and meet some problems with the scripts, you may try the solution from Sergio Daniel. Thank Sergio for sharing!
-------------------------
The evaluation script (http://research.microsoft.com/en-us/um/beijing/projects/letor//LETOR4.0/Evaluation/Eval-Score-4.0.pl.txt) isn't working for me on the letor 4.0 MQ2008 dataset. I use perl v5.14.2 on a linux machine. I made a little modification and now it is running =)
I replaced the line:
if ($lnFea =~ m/^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+)$/)
with:
if ($lnFea =~ m/^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+).$/)
Sergio.
-------------------------
| Data | Size | |
| Meta data | Meta data for MQ2007 query set | ~ 60M |
| Meta data for MQ2008 query set | ~ 50M | |
| Collection info | ~1 k | |
| Relation information | Link graph of Gov2 collection | ~ 480M |
| Sitemap of Gov2 collection | ~ 65M | |
| Similarity for MQ2007 query set | ~ 4.3G | |
| similarity for MQ2008 query set | ~ 4.9G |

