Semidefinite Programming for Classification Learning
Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Approaches are either based on regularisation or on the generation of virtual (transformed) examples. We developed a new framework for learning linear classifiers under known transformations based on semi-definite programming. We present a new learning algorithm - the Semi-Definite Programming Machine (SDPM) - which is able to find a maximum margin hyper-plane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one trans- formation parameter, and to learning with kernels are conceivable.
- Thore Graepel and Ralf Herbrich, Invariant Pattern Recognition by Semidefinite Programming Machines, in Advances in Neural Information Processing Systems 16, MIT Press, January 2004
- Thore Graepel, Ralf Herbrich, Andriy Kharechko, and John Shawe-Taylor, Semidefinite Programming by Perceptron Learning, in Advances in Neural Information Processing Systems 16, MIT Press, January 2004
Adaptive Margin Machines
Former approaches for learning kernel classifiers like Quadratic Programming Machines (SVMs), and Linear Programming Machines were based on minimization of a regularized margin loss where the margin was treated equivalently for each training pattern. We propose a reformulation of the minimization problem such that adaptive margins (AMM) for each training pattern are utilized. Furthermore, we give bounds on the generalization error of AMMs which justify their robustness against outliers. We show experimentally that the generalization error of AMMs is comparable to QP- and LP-Machines on benchmark datasets from the UCI repository.
- Jason Weston and Ralf Herbrich, Adaptive Margin Support Vector Machines, in Advances in Large Margin Classifiers, pp. 281–296, MIT Press, January 2000
- Ralf Herbrich and Jason Weston, Adaptive Margin Support Vector Machines for Classification Learning, in Proceedings of the Ninth International Conference on Artificial Neural Networks, January 1999
Ordinal Regression
In contrast to the standard machine learning tasks of classification and metric regression we investigate the problem of predicting variables of ordinal scale, a setting referred to as ordinal regression. This problem arises frequently in the social sciences and in information retrieval where human preferences play a major role. Whilst approaches proposed in Statistics rely on a probability model of a latent (unobserved) variable we present a distribution independent risk formulation of ordinal regression which allows us to derive uniform convergence bounds. Applying these bounds we present a large margin algorithm that is based on a mapping from objects to scalar utility values though classifying pairs of objects.
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer, Large Margin Rank Boundaries for Ordinal Regression, pp. 115–132, MIT Press, January 2000
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer, Regression Models for Ordinal Data: A Machine Learning Approach, no. TR 99-3, January 1999
- Ralf Herbrich, Thore Graepel, Peter Bollmann–Sdorra, and Klaus Obermayer, Learning a Preference Relation in IR, in Proceedings Workshop Text Categorization and Machine Learning, International Conference on Machine Learning 1998, January 1998
Classification Learning on Proximity Data
We investigate the problem of learning a classification task on data represented in terms of their pair wise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pair wise proximities can always be calculated. Our approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics w.r.t. their generalization.
- Thore Graepel, Ralf Herbrich, Peter Bollmann–Sdorra, and Klaus Obermayer, Classification on Pairwise Proximity Data, in Advances in Neural Information Processing Systems 11, MIT Press, January 1999
- Thore Graepel, Ralf Herbrich, Bernhard Schölkopf, Alex Smola, Peter Bartlett, Klaus Robert-Müller, Klaus Obermayer, and Robert Williamson, Classification on Proximity Data with LP–Machines, in Proceedings of the Ninth International Conference on Artificial Neural Networks, January 1999



