Adaptive Kalman Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model

  • Li Deng ,
  • Leo Lee ,
  • Hagai Attias ,
  • Alex Acero

IEEE Transactions on Audio, Speech and Language Processing | , Vol 15: pp. 13-23

A novel Kalman filtering/smoothing algorithm is presented for efficient and accurate estimation of vocal tract resonances or formants, which are natural frequencies and bandwidths of the resonator from larynx to lips, in fluent speech. The algorithm uses a hidden dynamic model, with a state-space formulation, where the resonance frequency and bandwidth values are treated as continuous-valued hidden state variables. The observation equation of the model is constructed by an analytical predictive function from the resonance frequencies and bandwidths to LPC cepstra as the observation vectors. This nonlinear function is adaptively linearized, and a residual or bias term, which is adaptively trained, is added to the nonlinear function to represent the iteratively reduced piecewise linear approximation error. Details of the piecewise linearization design process are described. An iterative tracking algorithm is presented, which embeds both the adaptive residual training and piecewise linearization design in the Kalman filtering/smoothing framework. Experiments on estimating resonances in Switchboard speech data show accurate estimation results. In particular, the effectiveness of the adaptive residual training is demonstrated. Our approach provides a solution to the traditional “hidden formant problem,” and produces meaningful results even during consonantal closures when the supra-laryngeal source may cause no spectral prominences in speech acoustics.