Sequential Minimal Optimization


Sequential Minimal Optimization (or SMO) is a fast method to train Support Vector Machines (SVMs). Training an SVM requires the solution of a very large quadratic programming (QP) optimization problem. SMO breaks this large QP problem into a series of smallest possible QP problems. These small QP problems are solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop. The amount of memory required for SMO is linear in the training set size, which allows SMO to handle very large training sets. Without kernel caching, SMO scales somewhere between linear and quadratic in the training set size for various test problems, while a standard projected conjugate gradient (PCG) chunking algorithm scales somewhere between linear and cubic in the training set size. SMO's computation time is dominated by SVM evaluation, hence SMO is fastest for linear SVMs and sparse data sets. For the MNIST database, SMO is 1.7 times faster than PCG chunking, while for the UCI Adult database and linear SVMs, SMO can be 1500 times faster than the PCG chunking algorithm.

My Publications

·         J. Platt, Using Sparseness and Analytic QP to Speed Training of Support Vector Machines, in Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla, D. A. Cohn, eds., MIT Press, (1999). Available in PDF format (69K) or PostScript format (101K). This is a seven page paper, but it mentions the latest heuristics (which are about 2x faster than those described below) and also gives timing comparisons to Thorsten Joachims' SVMlight simulator.

·         J. Platt, Fast Training of Support Vector Machines using Sequential Minimal Optimization, in Advances in Kernel Methods - Support Vector Learning,  B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, to appear, (1998). Available in PDF format (278K) or Compressed PostScript format (111K). This is a peer-reviewed book chapter and has more details than the technical report, below. The chapter does not contain an overview of SVMs or a bibliography, however. Note: as of 8/11/98, the equation numbers have changed from the older version.

·         J. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98-14, (1998). Available in PDF format (88K) or Compressed PostScript format (238 K). An older technical report about SMO --- contains an overview of SVMs.

Implementation Notes

There are some unclear ideas and errors in  the “Fast Training” paper that I would like to clarify on this web page:

·         Equations (12.9) and (12.10) in the paper are for when the decision function (1.7) subtracts b, not adds it.

·         Equation (12.18) has a sign error: the plus should be a minus.

·         The sentence after (12.24) should read “is negative” not “is positive”

·         In the pseudo-code, after the “if (eta < 0)” statement, right before the “if (|a2-alph2| < …)” statement, a2 should be checked to see if it is within 1e-8 of 0 or C. If so, then a2 should be set to the nearest bound. This prevents round-off error from mistakenly forcing a point to be a support vector. Richard Lippmann points out that this value of 1e-8 is appropriate when inputs have approximately unit variance. Therefore, it is a good idea to scale your inputs (or kernel outputs) so that they are on the order of unity.

I’ve fixed the paper on the web site.

Also, outliers can cause SVMs in general to behave oddly. For example, I’ve seen data sets where the SVM result is all zero weights. Outliers can also cause numerical problems in SMO. So, if you are getting strange results, I would filter out any severe outliers from your data set.

Data Sets

The real-world data sets described in the technical report are available in a compressed ASCII format (zip format). Both the adult data and the web data are available. There is a readme.txt file in each zip archive that explains the format of the file. The testing set for the adult data, the testing set for the web data set, and the MNIST data set is also available.

Implementations

Xianping Ge of UCI has implemented SMO in C++.

Chih-Chung Chang and Chih-Jen Lin have taken a version of SMO (from Keerthi), extended it, and wrapped it into a big package, LIBSVM, that does a lot of different SVM-related tasks.

Other Publications

Gray Flake and Steve Lawrence have an efficient SMO algorithm for Support Vector Regression.

 

Sathiya Keerthi and colleagues have a technical report that describes an improved SMO: instead of updating a single threshold, they update the bounds on permissible thresholds. They report substantial improvement in speed, especially for extreme C values.


Do you have questions or comments about the technical report or the data sets? Have you tried SMO on a real problem? Send me e-mail! My e-mail address is jplatt AT microsoft.com (non-standard form used to prevent spam).


This page was written by John Platt of the CCSP Group of Microsoft Research. Last updated: Sep 17, 2003.

Terms of Use
comment information

pubdisclaimer.gif (2130 bytes)