Haihua Xu, Daniel Povey, Jie Zhu, and Guanyong Wu
In this paper we show how methods for approximating phone
error as normally used for Minimum Phone Error (MPE) discriminative
training, can be used instead as a decoding criterion
for lattice rescoring. This is an alternative to Confusion Networks
(CN) which are commonly used in speech recognition.
The standard (Maximum A Posteriori) decoding approach is a
Minimum Bayes Risk estimate with respect to the Sentence Error
Rate (SER); however, we are typically more interested in
the Word Error Rate (WER). Methods such as CN and our proposed
Minimum Hypothesis Phone Error (MHPE) aim to get
closer to minimizing the expected WER. Based on preliminary
experiments we find that our approach gives more improvement
than CN, and is conceptually simpler.
|Published in||Interspeech 2009|
|Publisher||International Speech Communication Association|
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.