Simon Hill, Hugo Zaragoza, Ralf Herbrich, and Peter J. Rayner
In this paper we study the problem of generalisation in information retrieval. In particular we study precision-recall curves and the average precision value. We provide two types of bounds: large-deviation bounds of the average precision and maximum deviation bounds with respect to a given point of the precision recall curve. The first type of bounds are useful to answer the question: how far can true average precision be from the value observed on a test collection? The second is useful for obtaining bounds on average precision when tight bounds on a particular point of the curve can be established, as is the case when training SVMs or Perceptrons for document categorisation.
|Published in||Proceedings of the ACM SIGIR Workshop on Mathematical and Formal Methods in Information Retrieval|