Adaptive Kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model

Li Deng, Hagai Attias, Leo Lee, and Alex Acero

Abstract

A novel Kalman filtering/smoothing algorithm is

presented for efficient and accurate estimation of vocal tract resonances

or formants, which are natural frequencies and bandwidths

of the resonator from larynx to lips, in fluent speech. The algorithm

uses a hidden dynamic model, with a state-space formulation,

where the resonance frequency and bandwidth values are treated

as continuous-valued hidden state variables. The observation

equation of the model is constructed by an analytical predictive

function from the resonance frequencies and bandwidths to LPC

cepstra as the observation vectors. This nonlinear function is

adaptively linearized, and a residual or bias term, which is adaptively

trained, is added to the nonlinear function to represent the

iteratively reduced piecewise linear approximation error. Details

of the piecewise linearization design process are described. An

iterative tracking algorithm is presented, which embeds both

the adaptive residual training and piecewise linearization design

in the Kalman filtering/smoothing framework. Experiments on

estimating resonances in Switchboard speech data show accurate

estimation results. In particular, the effectiveness of the adaptive

residual training is demonstrated. Our approach provides a solution

to the traditional “hidden formant problem,” and produces

meaningful results even during consonantal closures when the

supra-laryngeal source may cause no spectral prominences in

speech acoustics.

Details

Publication typeArticle
Published inIEEE Transactions on audio, Speech and Language Processing
Pages13-23
Volume15
Number1
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > Adaptive Kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model