Sequential Minimal
Optimization (or SMO) is a fast method to train Support
Vector Machines (SVMs). Training an SVM requires the solution of a very large
quadratic programming (QP) optimization problem. SMO breaks this large QP
problem into a series of smallest possible QP problems. These small QP problems
are solved analytically, which avoids using a time-consuming numerical QP
optimization as an inner loop. The amount of memory required for SMO is linear
in the training set size, which allows SMO to handle very large training sets.
Without kernel caching, SMO scales somewhere between linear and quadratic in
the training set size for various test problems, while a standard projected
conjugate gradient (PCG) chunking algorithm scales somewhere between linear and
cubic in the training set size. SMO's computation time is dominated by SVM evaluation, hence SMO is fastest for linear SVMs and sparse
data sets. For the MNIST database, SMO is 1.7 times faster than PCG chunking,
while for the UCI Adult database and linear SVMs, SMO can be 1500 times faster
than the PCG chunking algorithm.
· J. Platt, Using Sparseness and Analytic QP to Speed Training of Support Vector Machines, in Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla, D. A. Cohn, eds., MIT Press, (1999). Available in PDF format (69K) or PostScript format (101K). This is a seven page paper, but it mentions the latest heuristics (which are about 2x faster than those described below) and also gives timing comparisons to Thorsten Joachims' SVMlight simulator.
·
J.
Platt, Fast Training of Support Vector
Machines using Sequential Minimal Optimization, in Advances in
Kernel Methods - Support Vector Learning, B. Schölkopf, C. Burges,
and A. Smola, eds., MIT Press, to appear, (1998). Available in PDF format (278K) or Compressed
PostScript format (111K). This is a peer-reviewed book chapter and has more
details than the technical report, below. The chapter does not contain an
overview of SVMs or a bibliography, however. Note: as of
· J. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98-14, (1998). Available in PDF format (88K) or Compressed PostScript format (238 K). An older technical report about SMO --- contains an overview of SVMs.
There
are some unclear ideas and errors in the
“Fast Training” paper that I would like to clarify on this web page:
·
Equations (12.9) and (12.10) in the paper are for
when the decision function (1.7) subtracts b, not adds it.
·
Equation (12.18) has a sign error: the plus should
be a minus.
·
The sentence after (12.24) should read “is
negative” not “is positive”
·
In the pseudo-code, after the “if (eta < 0)”
statement, right before the “if (|a2-alph2| < …)” statement, a2 should be
checked to see if it is within 1e-8 of 0 or C. If so, then a2 should be set to
the nearest bound. This prevents round-off error from mistakenly forcing a
point to be a support vector. Richard Lippmann points out that this value of
1e-8 is appropriate when inputs have approximately unit variance. Therefore, it
is a good idea to scale your inputs (or kernel outputs) so that they are on the
order of unity.
I’ve
fixed the paper on the web site.
Also,
outliers can cause SVMs in general to behave oddly. For example, I’ve seen data
sets where the SVM result is all zero weights. Outliers can also cause
numerical problems in SMO. So, if you are getting strange results, I would
filter out any severe outliers from your data set.
The real-world data sets
described in the technical report are available in a compressed ASCII format
(zip format). Both the adult data and the web data are available. There is a readme.txt file in each
zip archive that explains the format of the file. The testing
set for the adult data, the testing set for the web
data set, and the MNIST
data set is also available.
Xianping Ge of UCI has implemented SMO in C++.
Chih-Chung
Chang and Chih-Jen Lin have
taken a version of SMO (from Keerthi), extended it, and wrapped it into a big
package, LIBSVM, that does a lot of different SVM-related tasks.
Gray Flake and Steve Lawrence have an efficient SMO
algorithm for Support
Vector Regression.
Sathiya
Keerthi and colleagues have a technical
report that describes an improved SMO: instead of updating a single threshold,
they update the bounds on permissible thresholds. They report substantial
improvement in speed, especially for extreme C values.
Do you have questions or
comments about the technical report or the data sets? Have you tried SMO on a
real problem? Send me e-mail! My e-mail address is jplatt AT microsoft.com
(non-standard form used to prevent spam).
This page was written by John Platt of the CCSP Group of Microsoft Research. Last updated: Sep
17, 2003.
![]()