A Comparative Study of Recurrent Neural Network Models for Lexical Domain Classification

  • Suman Ravuri ,
  • Andreas Stolcke

Proc. IEEE ICASSP |

Published by IEEE - Institute of Electrical and Electronics Engineers

Domain classification is a critical pre-processing step for many speech understanding and dialog systems, as it allows for certain types of utterances to be routed to specialized subsystems. In previous work, we explored various neural network (NN) architectures for binary utterance classification based on lexical features, and found that they improved upon more traditional statistical baselines. In this paper we generalize to an n-way classification task, and test the best-performing NN architectures on a large, real-world dataset from the Cortana personal assistant application. As in the earlier work, we find that recurrent NNs with gated memory units (LSTM and GRU) perform best, beating out state-of-the-art baseline systems based on language models or boosting classifiers. NN classifiers can still benefit from combining their posterior class estimates with traditional language model likelihood ratios, via a logistic regression combiner. We also investigate whether it is better to use an ensemble of binary classifiers or a NN trained for n-way classification, and how each approach performs in combination with the baseline classifiers. The best overall results are obtained by first combining an ensemble of binary GRU-NN classifiers with LM likelihood ratios, followed by picking the highest class posterior estimate.