Tracking Vocal Tract Resonances Using a Quantized Nonlinear Function Embedded in a Temporal Constraint

This paper presents a new technique for high-accuracy

tracking of vocal-tract resonances (which coincide with

formants for nonnasalized vowels) in natural speech. The technique

is based on a discretized nonlinear prediction function,

which is embedded in a temporal constraint on the quantized

input values over adjacent time frames as the prior knowledge for

their temporal behavior. The nonlinear prediction is constructed,

based on its analytical form derived in detail in this paper, as

a parameter-free, discrete mapping function that approximates

the “forward” relationship from the resonance frequencies and

bandwidths to the Linear Predictive Coding (LPC) cepstra of

real speech. Discretization of the function permits the “inversion”

of the function via a search operation. We further introduce the

nonlinear-prediction residual, characterized by a multivariate

Gaussian vector with trainable mean vectors and covariance

matrices, to account for the errors due to the functional approximation.

We develop and describe an expectation–maximization

(EM)-based algorithm for training the parameters of the residual,

and a dynamic programming-based algorithm for resonance

tracking. Details of the algorithm implementation for computation

speedup are provided. Experimental results are presented which

demonstrate the effectiveness of our new paradigm for tracking

vocal-tract resonances. In particular, we show the effectiveness of

training the prediction-residual parameters in obtaining high-accuracy

resonance estimates, especially during consonantal closure.

2006-deng-transa.pdf
PDF file

In  IEEE Trans. on Audio, Speech and Language Processing

Details

TypeArticle
Pages425-434
Volume14
Number2
> Publications > Tracking Vocal Tract Resonances Using a Quantized Nonlinear Function Embedded in a Temporal Constraint