A Structured Speech Model with Continuous Hidden Dynamics and Prediction-Residual Training for Tracking Vocal Tract Resonances

A novel approach is developed for efficient and accurate tracking

of vocal tract resonances, which are natural frequencies of the resonator

from larynx to lips, in fluent speech. The tracking algorithm

is based on a version of the structured speech model consisting

of continuous-valued hidden dynamics and a piecewise-linearized

prediction function from resonance frequencies and bandwidths

to LPC cepstra. We present details of the piecewise linearization

design process and an adaptive training technique for the parameters

that characterize the prediction residuals. An iterative

tracking algorithm is described and evaluated that embeds both the

prediction-residual training and the piecewise linearization design

in an adaptive Kalman filtering framework. Experiments on tracking

vocal tract resonances in Switchboard speech data demonstrate

high accuracy in the results, as well as the effectiveness of residual

training embedded in the algorithm. Our approach differs from

traditional formant trackers in that it provides meaningful results

even during consonantal closures when the supra-laryngeal source

may cause no spectral prominences in speech acoustics.

2004-deng-icassp.pdf
PDF file

In  Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing

Details

TypeInproceedings
> Publications > A Structured Speech Model with Continuous Hidden Dynamics and Prediction-Residual Training for Tracking Vocal Tract Resonances