Li Deng, L. Lee, H. Attias, and Alex Acero
May 2004
A novel approach is developed for efficient and accurate tracking
of vocal tract resonances, which are natural frequencies of the resonator
from larynx to lips, in fluent speech. The tracking algorithm
is based on a version of the structured speech model consisting
of continuous-valued hidden dynamics and a piecewise-linearized
prediction function from resonance frequencies and bandwidths
to LPC cepstra. We present details of the piecewise linearization
design process and an adaptive training technique for the parameters
that characterize the prediction residuals. An iterative
tracking algorithm is described and evaluated that embeds both the
prediction-residual training and the piecewise linearization design
in an adaptive Kalman filtering framework. Experiments on tracking
vocal tract resonances in Switchboard speech data demonstrate
high accuracy in the results, as well as the effectiveness of residual
training embedded in the algorithm. Our approach differs from
traditional formant trackers in that it provides meaningful results
even during consonantal closures when the supra-laryngeal source
may cause no spectral prominences in speech acoustics.
![]() PDF file |
In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
| Type | Inproceedings |