Inductive and Example-Based Learning for Text Classification

  • Ye-Yi Wang ,
  • Xiao Li ,
  • Alex Acero

Interspeech |

Published by International Speech Communication Association

Text classification has been widely applied to many practical tasks. Inductive models trained from labeled data are the most commonly used technique. The basic assumption underlying an inductive model is that the training data are drawn from the same distribution as the test data. However, labeling such a training set is often expensive for practical applications. On the other hand, a large amount of labeled data, which have been drawn from a different distribution, is often available in the same application domain. It is thus very desirable to take advantage of these data even though there is a discrepancy between their underlying distribution and that of the test set. This paper compares three text classification algorithms applied in this scenario, including two inductive Maximum Entropy (MaxEnt) models, one flatly initialized and the other initialized with a term-frequency/inverse document frequency (Tf*Idf) weighted vector space model, and an example-based learning algorithm, which assigns a class label to a text by learning from the labels assigned to the training data that are similar to the text. Experiment results show that example-based learning has achieved more than 5% improvement in precisions across almost all coverage levels.