John Guiver, Stefano Mizzaro, and Stephen Robertson
We consider the issue of evaluating information retrieval systems on the basis of a limited number of topics. In contrast to statistically-based work on sample sizes, we hypothesise that some topics or topic sets are better than others at predicting true system effectiveness, and that with the right choice of topics, accurate predictions can be obtained from small topics sets. Using a variety of effectiveness metrics and measures of goodness of prediction, a study of a set of TREC and NTCIR results confirms this hypothesis, and provides evidence that the value of a topic set for this purpose does generalise.
In ACM Transactions on Information Systems (TOIS) volume 27 issue 4
Publisher Association for Computing Machinery, Inc.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or email@example.com. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.