Support vector machines are a set of algorithms that learn from data by creating models that maximize their margin of error.
Support vector machines (SVMs) are a family of algorithms for classification, regression, transduction, novelty detection, and semi-supervised learning. They work by choosing a model that maximizes the error margin of a training set.
SVMs were originally developed by Vladimir Vapnik in 1963. Since the mid-90s, a energetic research community has grown around them. If you want to learn more about SVMs, you can read Chris Burges' tutorial. Nello Cristianini and John Shawe-Taylor have written a textbook about them. Bernhard Schölkopf and Alex Smola wrote a textbook about kernel methods, which are a closely-related set of methods.
Since 1998, we've done basic research into making SVMs be more user-friendly. Our research has resulted in:
- SMO: A fast algorithm for training SVMs from data, which is easy to understand and code.
- A method for calibrating the output of an SVM to yield probabilities.
- A simple method to convert a multi-class problem into a series of faster-to-solve two-class SVMs
- A method to apply SVMs to find unusual items in a training set (novelty detection).
- An online approximation to SVMs.
See the list of publications, below, for complete citations.
Data sets and software
The real-world data sets described in the technical report (below) are available in a compressed ASCII format (zip format). Both the adult data and the web data are available. There is a readme.txt file in each zip archive that explains the format of the file. The testing set for the adult data, the testing set for the web data set, and the MNIST data set is also available.
MSR currently does not have any software that implements SVMs. LIBSVM is a popular package that is based on a SMO-like algorithm.
Check here for errata on the SMO "Fast training" physical paper (already corrected in the on-line version).
- Edward Harrington, Ralf Herbrich, Jyrki Kivinen, John C. Platt, and Robert C. Williamson, Online Bayes Point Machines, in Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, January 2003
- John C. Platt, Bernhard Schölkopf, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson, Estimating the Support of a High-Dimensional Distribution, no. MSR-TR-99-87, November 1999
- John C. Platt, Probabilities for SV Machines, in Advances in Large Margin Classifiers, MIT Press, March 1999
- John Platt, Using Analytic QP and Sparseness to Speed Training of Support Vector Machines, in Proc. Advances in Neural Information Processing Systems 11, January 1999
- John C. Platt, John Shawe-Taylor, and Nello Cristianini, Large Margin DAG's for Multiclass Classification, in Proc. Advances in Neural Information Processing Systems 12, January 1999
- John Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, no. MSR-TR-98-14, April 1998
- John C. Platt, Fast Training of Support Vector Machines Using Sequential Minimal Optimization, in Advances in Kernel Methods - Support Vector Learning, MIT Press, January 1998
- David Heckerman, John Platt, Mehran Sahami, and Susan Dumais, Inductive Learning Algorithms and Representations for Text Categorization, in 7th International Conference on Information and Knowledge Management, January 1998
Related external publications
Sathiya Keerthi and colleagues have a paper that describes an improved SMO: instead of updating a single threshold, they update the bounds on permissible thresholds. They report substantial improvement in speed, especially for extreme C values.
Gary Flake and Steve Lawrence have an efficient SMO algorithm for Support Vector Regression.
