Inductive Learning Algorithms and Representations for Text Categorization

Authors

Susan Dumais, Microsoft Research

John Platt, Microsoft Research

David Heckerman, Microsoft Research

Mehran Sahami, Microsoft Research (current affiliation: Google)

Reference

Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148-155, (1998).

Abstract

Text categorization – the assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime

classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned

automatically from training examples. Linear Support Vector Machines (SVMs) are particularly promising because they are very accurate, quick to train, and quick to evaluate.

Keywords

classification, information management, machine learning, support vector machines, text categorization

Paper Link

© ACM, 1998. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in the Proceedings of the 7th International Conference on Information and Knowledge Management, November 1998, http://doi.acm.org/10.1145/288627.288651

PDF file (81 KB)