Novel Acoustic Modeling with Structured Hidden Dynamics for Speech Coarticulation and Reduction

Li Deng; Dong Yu; Alex Acero; Xiaolong(Shiao-Long) Li

Novel Acoustic Modeling with Structured Hidden Dynamics for Speech Coarticulation and Reduction

Li Deng ,
Dong Yu ,
Alex Acero ,
Xiaolong(Shiao-Long) Li

Proc. of the DARPA RT04 Workshop | November 2004

Download BibTex

We report in this paper our recent progress on the new development, implementation, and evaluation of the structured speech model with statistically characterized hidden trajectories. Unidirectionality in coarticulation modeling in such hidden trajectory models as presented in previous EARS workshops has been extended to bi-directionality (forward as well as backward in the temporal dimension), offering signiﬁcantly more power in parsimonious modeling of long-span context dependency. This new type of model, when appropriately implemented, also simultaneously exhibits the property of contextually assimilated phonetic reduction or phonetic target undershooting that is prevalent in casual, ﬂuent speech (e.g., conversational speech). Experiments on large-scale N-best rescoring (N=1000) have demonstrated substantially lower phone recognition errors achieved by the model compared with a context-dependent (triphone) HMM system built with HTK. When the “error propagation” effect of the long-span acoustic model is artiﬁcially removed in the N-best rescoring paradigm (via adding the reference hypotheses into the 1000-best list), the error rate is further cut down in a dramatic manner.