Xiao Li, Asela Gunawardana, and Alex Acero
Unforeseen user intents can account for a significant portion of unsuccessful calls in an automatic voice response system. Discovering these unforeseen semantic intents usually requires expensive manual transcriptions. We propose a method to cluster the acoustics from logged calls by their estimated semantic intents. This is achieved through training a mixture of language models in an unsupervised manner. Each cluster is presented to the application developer with a suggested language model to cover the semantic intent of the data in that cluster. The application developer validates the cluster and its suggested language model, and then updates the application. A quantative evaluation on a corporate voice-dialer application shows that updating the application in this manner yields a relative 13.4% reduction in semantic error rate.
In International Conference on Acoustics, Speech, and Signal Processing
Publisher Institute of Electrical and Electronics Engineers, Inc.
© 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.