Simon Hill, Hugo Zaragoza, Ralf Herbrich, and Peter J. Rayner
In this paper we study the problem of generalisation in information retrieval. In particular we study precision-recall curves and the average precision value. We provide two types of bounds: large-deviation bounds of the average precision and maximum deviation bounds with respect to a given point of the precision recall curve. The first type of bounds are useful to answer the question: how far can true average precision be from the value observed on a test collection? The second is useful for obtaining bounds on average precision when tight bounds on a particular point of the curve can be established, as is the case when training SVMs or Perceptrons for document categorisation.
In Proceedings of the ACM SIGIR Workshop on Mathematical and Formal Methods in Information Retrieval