Kernel Methods and Convex Optimisation

Thore Graepel and Ralf Herbrich


The Bayes Point Machine


From a Bayesian perspective support vector machines choose the hypothesis corresponding to the largest possible hyper-sphere that can be inscribed in version space, i.e. in the space of all consistent hypotheses given a training set. Those boundaries of version space which are tangent to the hyper-sphere define the support vectors. An alternative and potentially better approach is to construct the hypothesis using the whole of version space. This is achieved by using a Bayes point machine which finds the midpoint of the region of intersection of all hyper-planes bisecting version space into two halves of equal volume (the Bayes point). It is known that the centre of mass of version space approximates the Bayes point. We investigate estimating the centre of mass by averaging over the trajectory of a billiard ball bouncing in version space. Experimental results indicate that Bayes point machines consistently outperform support vector machines.
 

The Semi-Definite Programming Machine: Transformation Invariant Learning


Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We developed a new framework for learning linear classifiers under known transformations based on semi-definite programming. We present a new learning algorithm - the Semi-Definite Programming Machine (SDPM) - which is able to find a maximum margin hyper-plane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one trans- formation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods.

 

Machine Learning Approaches to Ordinal Regression


In contrast to the standard machine learning tasks of classification and metric regression we investigate the problem of predicting variables of ordinal scale, a setting referred to as ordinal regression. This problem arises frequently in the social sciences and in information retrieval where human preferences play a major role. Whilst approaches proposed in Statistics rely on a probability model of a latent (unobserved) variable we present a distribution independent risk formulation of ordinal regression which allows us to derive uniform convergence bounds. Applying these bounds we present a large margin algorithm that is based on a mapping from objects to scalar utility values though classifying pairs of objects. We give experimental results for an Information Retrieval task which show that our algorithm outperforms more naive approaches to ordinal regression such as support vector classification and support vector regression in the case of more than two ranks.
 

Informative Vector Machine


We present a framework for sparse Gaussian process methods which uses forward selection with criteria based on information-theoretical principles, previously suggested for active learning. In contrast to most previous work on sparse GPs, our goal is not only to learn sparse predictors (which can be evaluated in O(d) rather than O(n), d<<n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n d^2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet it requires only a fraction of the training time. In contrast to the SVM, our approximation produces estimates of predictive probabilities (`error bars'), allows for Bayesian model selection and is less complex in implementation.
 


Relevant publications


Links


Machine Learning and PerceptionMachine Learning—Kernel Methods