Use of Neural Network Mapping and Extended Kalman Filter to Recover Vocal Tract Resonances from the MFCC Parameters of Speech

R. Togneri and Li Deng

Abstract

In this paper, we present a state-space formulation of a neuralnetwork-

based hidden dynamic model of speech whose parameters

are trained using an approximate EM algorithm. The training

makes use of the results of an off-the-shelf formant tracker (during

the vowel segments) to simplify the complex sufficient statistics

that would be required in the exact EM algorithm. The trained

model, consisting of the state equation for the target-directed vocal

tract resonance (VTR) dynamics on all classes of speech sounds

(including consonant closure) and the observation equation for

mapping from the VTR to acoustic measurement, is then used

to recover the unobserved VTR based on Extended Kalman Filter.

The results demonstrate accurate estimation of the VTRs, especially

those during rapic consonant-vowel or vowel-consonant

transitions and during consonant closure when the acoustic measurement

alone provides weak or no information to infer the VTR

values.

Details

Publication typeInproceedings
Published inProc. Int. Conf. on Spoken Language Processing
> Publications > Use of Neural Network Mapping and Extended Kalman Filter to Recover Vocal Tract Resonances from the MFCC Parameters of Speech