Original title: “Probabilistic
Outputs for Support Vector Machines and Comparisons to Regularized Likelihood
Methods”
John C. Platt, CCSP Group, Microsoft Research
Advances in Large Margin Classifiers,
A. Smola, P. Bartlett, B. Schölkopf,
D. Schuurmans,
The output of a classifier should be a calibrated posterior
probability to enable post-processing. Standard SVMs do not provide such
probabilities. One method to create probabilities is to directly train a kernel
classifier with a logit link function and a regularized maximum likelihood score.
However, training with a maximum likelihood score will produce non-sparse
kernel machines. Instead, we train an SVM, then train
the parameters of an additional sigmoid function to map the SVM outputs into
probabilities. This chapter compares classification error rate and likelihood
scores for an SVM plus sigmoid versus a kernel method trained with a
regularized likelihood error function. These methods are tested on three
data-mining-style data sets. The SVM+sigmoid yields
probabilities of comparable quality to the regularized maximum likelihood kernel
method, while still retaining the sparseness of the SVM.
gzipped PS file (84 KB)