Li Deng, Dong Yu, and Alex Acero
We report our new development of a hidden trajectory model for co-articulated, time-varying patterns of speech. The model uses bi-directional filtering of vocal tract resonance targets to jointly represent contextual variation and phonetic reduction in speech acoustics. A novel maximum-likelihood-based learning algorithm is presented that accurately estimates the distributional parameters of the resonance targets. The results of the estimates are analyzed and shown to be consistent with all the relevant acoustic-phonetic facts and intuitions. Phonetic recognition experiments demonstrate that the model with more rigorous target training outperforms the most recent earlier version of the model, producing 17.5% fewer errors in N-best rescoring.
In Proc. of the Interspeech Conference
Publisher International Speech Communication Association
© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.