Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
A Quantitative Model for Formant Dynamics and Contextually Assimilated Reduction in Fluent Speech

Li Deng, Dong Yu, and Alex Acero

Abstract

A quantitative model of coarticulation is presented that accurately predicts formant dynamics in fluent speech using the prior information of resonance targets in the phone sequence, in absence of actual acoustic data. Realistic formant undershoot (reduction) and “static” sound confusion is produced naturally from the model for fast-rate speech in a contextually assimilated manner. The model developed is capable of resolving the confusion with dynamic speech specification. As a source of a-priori knowledge about the speech structure, the model is a central component of our Bayesian generative modeling approach to automatic recognition of conversational speech, where varying degrees of sound reduction abound due to the free-varying speaking style and rate. We present details of the model simulation that demonstrates quantitative effects of speaking rate and segment duration on the magnitude of reduction, agreeing closely with experimental measurement results in the acoustic-phonetic literature. The model simulation also gives quantitative effects of varying the “stiffness’ parameter in the model.

Details

Publication typeInproceedings
Published inProc. Int. Conf. on Spoken Language Processing
PublisherInternational Speech Communication Association
> Publications > A Quantitative Model for Formant Dynamics and Contextually Assimilated Reduction in Fluent Speech