Inductive Learning Algorithms and Representations for Text Categorization

David Heckerman, John Platt, Mehran Sahami, and Susan Dumais

Abstract

Text categorization - the assignment of natural language texts to one or more predefined categories based on their content - is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned automatically from training examples. Linear Support Vector Machines (SVMS) are particularly promising because they are very accurate, quick to train, and quick to evaluate.

Details

Publication typeInproceedings
Published in7th International Conference on Information and Knowledge Management
Pages148-152
> Publications > Inductive Learning Algorithms and Representations for Text Categorization