A Quantitative Model for Formant Dynamics and Contextually Assimilated Reduction in Fluent Speech

  • Li Deng ,
  • Dong Yu ,
  • Alex Acero

Proc. Int. Conf. on Spoken Language Processing |

Published by International Speech Communication Association

A quantitative model of coarticulation is presented that accurately predicts formant dynamics in fluent speech using the prior information of resonance targets in the phone sequence, in absence of actual acoustic data. Realistic formant undershoot (reduction) and “static” sound confusion is produced naturally from the model for fast-rate speech in a contextually assimilated manner. The model developed is capable of resolving the confusion with dynamic speech specification. As a source of a-priori knowledge about the speech structure, the model is a central component of our Bayesian generative modeling approach to automatic recognition of conversational speech, where varying degrees of sound reduction abound due to the free-varying speaking style and rate. We present details of the model simulation that demonstrates quantitative effects of speaking rate and segment duration on the magnitude of reduction, agreeing closely with experimental measurement results in the acoustic-phonetic literature. The model simulation also gives quantitative effects of varying the “stiffness’ parameter in the model.