Li Deng, I. Bazzi, and Alex Acero
A technique for high-accuracy tracking of formants or vocal tract resonances is presented in this paper using a novel nonlinear predictor and using a target-directed temporal constraint. The nonlinear predictor is constructed from a parameter-free, discrete mapping function from the formant (frequencies and bandwidths) space to the LPC-cepstral space, with trainable residuals. We examine in this study the key role of vocal tract resonance targets in the tracking accuracy. Experimental results show that due to the use of the targets, the tracked formants in the consonantal regions (including closures and short pauses) of the speech utterance exhibit the same dynamic properties as for the vocalic regions, and reflect the underlying vocal tract resonances. The results also demonstrate the effectiveness of training the prediction-residual parameters and of incorporating the target-based constraint in obtaining high-accuracy formant estimates, especially for non-sonorant portions of speech.
|Published in||Proc. of the Eurospeech Conference. Geneva|