John C. Platt, CCSP Group, Microsoft Research
Chapter 12, Advances in Kernel
Methods - Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola,
This chapter describes a new algorithm for training
Support Vector Machines: Sequential Minimal Optimization, or SMO. Training a
Support Vector Machine (SVM) requires the solution of a very large quadratic
programming (QP) optimization problem. SMO breaks this large QP problem into a
series of smallest possible QP problems. These small QP problems are solved
analytically, which avoids using a time-consuming numerical QP optimization as
an inner loop. The amount of memory required for SMO is linear in the training
set size, which allows SMO to handle very large training sets. Because large
matrix computation is avoided, SMO scales somewhere between linear and
quadratic in the training set size for various test problems, while a standard
projected conjugate gradient (PCG) chunking algorithm scales somewhere between
linear and cubic in the training set size. SMO's computation time is dominated
by SVM evaluation, hence SMO is fastest for linear SVMs and sparse data sets.
For the MNIST database, SMO is as fast as PCG chunking; while for the UCI Adult
database and linear SVMs, SMO can be more than 1000 times faster than the PCG
chunking algorithm.
PDF file (290KB)
PS file (261 KB)
A version of the paper was also published at the NIPS conference, as “Using Analytic QP and Sparseness to Speed Training of Support Vector Machines,” by J.C. Platt, Advances in Neural Information Processing Systems 11, pp. 557-563, (1999):
PDF file (70 KB)
PS file (101KB)