Jui-Ting Huang, Xiao Li, and Alex Acero
This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error model. A transductive approach, in contrast, obtains that information by running a speech recognizer on test-set acoustics, with the goal of optimizing the test-set performance. Experiments show significant recognition accuracy improvements in both rescoring and first-pass decoding experiments using the transductive approach, and mixed results using the inductive approach.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. http://www.ieee.org/