PAC-Bayesian compression bounds on the prediction error of learning algorithms for classification

Thore Graepel; Ralf Herbrich; John Shawe-Taylor

PAC-Bayesian compression bounds on the prediction error of learning algorithms for classification

Thore Graepel ,
Ralf Herbrich ,
John Shawe-Taylor

Machine Learning | January 2005 , Vol 59: pp. 55-76

Download BibTex

We consider bounds on the prediction error of classification algorithms based on sample compression. We refine the notion of a compression scheme to distinguish permutation and repetition invariant and non-permutation and repetition invariant compression schemes leading to different prediction error bounds. Also, we extend known results on compression to the case of non-zero empirical risk. We provide bounds on the prediction error of classifiers returned by mistakedriven online learning algorithms by interpreting mistake bounds as bounds on the size of the respective compression scheme of the algorithm. This leads to a bound on the prediction error of perceptron solutions that depends on the margin a support vector machine would achieve on the same training sample. Furthermore, using the property of compression we derive bounds on the average prediction error of kernel classifiers in the PAC-Bayesian framework. These bounds assume a prior measure over the expansion coefficients in the data-dependent kernel expansion and bound the average prediction error uniformly over subsets of the space of expansion coefficients.