Susan Dumais, Microsoft Research
John Platt, Microsoft Research
David Heckerman, Microsoft Research
Mehran Sahami, Microsoft Research (current affiliation: Google)
Proceedings of the 7th International
Conference on Information and Knowledge Management, pp.
148-155, (1998).
Text categorization – the assignment of natural language texts to
one or more predefined categories based on their content – is an important
component in many information organization and management tasks. We compare the
effectiveness of five different automatic learning algorithms for text categorization
in terms of learning speed, realtime
classification speed, and classification accuracy. We also
examine training set size, and alternative document representations. Very
accurate text classifiers can be learned
automatically from training examples. Linear Support
Vector Machines (SVMs) are particularly promising because they are very accurate,
quick to train, and quick to evaluate.
classification, information
management, machine learning, support vector machines, text categorization
©
ACM, 1998. This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The definitive
version was published in the Proceedings of the 7th International
Conference on Information and Knowledge Management, November 1998, http://doi.acm.org/10.1145/288627.288651
PDF file (81 KB)