Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Inductive and Example-Based Learning for Text Classification

Ye-Yi Wang, Xiao Li, and Alex Acero

Abstract

Text classification has been widely applied to many practical tasks. Inductive models trained from labeled data are the most commonly used technique. The basic assumption underlying an inductive model is that the training data are drawn from the same distribution as the test data. However, labeling such a training set is often expensive for practical applications. On the other hand, a large amount of labeled data, which have been drawn from a different distribution, is often available in the same application domain. It is thus very desirable to take advantage of these data even though there is a discrepancy between their underlying distribution and that of the test set. This paper compares three text classification algorithms applied in this scenario, including two inductive Maximum Entropy (MaxEnt) models, one flatly initialized and the other initialized with a term-frequency/inverse document frequency (Tf*Idf) weighted vector space model, and an example-based learning algorithm, which assigns a class label to a text by learning from the labels assigned to the training data that are similar to the text. Experiment results show that example-based learning has achieved more than 5% improvement in precisions across almost all coverage levels.

Details

Publication typeInproceedings
Published inInterspeech
Pages1610-1613
AddressBrisbane, Australia
PublisherInternational Speech Communication Association
> Publications > Inductive and Example-Based Learning for Text Classification