Ruhi Sarikaya, Geoffrey Hinton, and Anoop Deoras
Applications of deep belief nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer–wise pretraining method that uses an efficient learning algorithm called contrastive divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: support vector machines (SVM), boosting and maximum entropy (MaxEnt). The plain DBN-based model gives a call–routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre–training and combining DBN–based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.
In IEEE Transactions on Audio Speech and Language Processing