Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Support Vector Machines

Support vector machines are a set of algorithms that learn from data by creating models that maximize their margin of error.

Support vector machines (SVMs) are a family of algorithms for classificationregressiontransduction, novelty detection, and semi-supervised learning. They work by choosing a model that maximizes the error margin of a training set.

SVMs were originally developed by Vladimir Vapnik in 1963. Since the mid-90s, a energetic research community has grown around them. If you want to learn more about SVMs, you can read Chris Burges' tutorialNello Cristianini and John Shawe-Taylor have written a textbook about them. Bernhard Schölkopf and Alex Smola wrote a textbook about kernel methods, which are a closely-related set of methods.

Since 1998, we've done basic research into making SVMs be more user-friendly. Our research has resulted in:

See the list of publications, below, for complete citations.

Data sets and software

The real-world data sets described in the technical report (below) are available in a compressed ASCII format (zip format). Both the adult data and the web data are available. There is a readme.txt file in each zip archive that explains the format of the file. The testing set for the adult data, the testing set for the web data set, and the MNIST data set is also available.

MSR currently does not have any software that implements SVMs. LIBSVM is a popular package that is based on a SMO-like algorithm.

Check here for errata on the SMO "Fast training" physical paper (already corrected in the on-line version).

Publications

Related external publications

Sathiya Keerthi and colleagues have a paper that describes an improved SMO: instead of updating a single threshold, they update the bounds on permissible thresholds. They report substantial improvement in speed, especially for extreme C values.

Gary Flake and Steve Lawrence have an efficient SMO algorithm for Support Vector Regression.