Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Learning at Low False Positive Rates

W. Yih, J. Goodman, and G. Hulten

Abstract

Most spam filters are configured for use at a very low false-positive rate. Typically, the filters are trained with techniques that optimize accuracy or entropy, rather than performance in this configuration. We describe two different techniques for optimizing for the low false-positive region. One method weights good data more than spam. The other method uses a two-stage technique of first finding data in the low false-positive region, and then learning using this subset. We show that with two different learning algorithms, logistic regression and Naive Bayes, we achieve substantial improvements, reducing missed spam by as much as 20% relative for logistic regression and 40% for Naive Bayes at the same low false-positive rate.

Details

Publication typeInproceedings
Published inProceedings of the 3rd Conference on Email and Anti-Spam
PublisherCEAS
> Publications > Learning at Low False Positive Rates