Jinyu Li, Li Deng, Dong Yu, Yifan Gong, and Alex Acero
In this paper, we present our recent development of a modeldomain environment-robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using a nonlinear environment distortion model in the cepstral domain, the speech recognizer’s “feedback” information, and the Vector-Taylor-Series (VTS) linearization technique collectively. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions of the HMM means and variances. This two-step algorithm enables Joint compensation of both Additive and Convolutive distortions (JAC). In the experimental evaluation using the standard Aurora 2 task, the proposed JAC/VTS algorithm achieves 91.11% accuracy using the clean-trained simple HMM backend as the baseline system for the model adaptation. This represents high recognition performance on this task without discriminative training of the HMM system. Detailed analysis on the experimental results shows that adaptation of the dynamic portion of the HMM mean and variance parameters is critical to the success of our algorithm.
|Published in||Proc. IEEE Automatic Speech Recognition and Understanding|