Jinyu Li, Li Deng, Dong Yu, Yifan Gong, and Alex Acero
In this paper, we present our recent development of a modeldomain environment-robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using a nonlinear environment distortion model in the cepstral domain, the speech recognizer’s “feedback” information, and the Vector-Taylor-Series (VTS) linearization technique collectively. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions of the HMM means and variances. This two-step algorithm enables Joint compensation of both Additive and Convolutive distortions (JAC).
In the experimental evaluation using the standard Aurora 2 task, the proposed JAC/VTS algorithm achieves 91.11% accuracy using the clean-trained simple HMM backend as the baseline system for the model adaptation. This represents high recognition performance on this task without discriminative training of the HMM system. Detailed analysis on the experimental results shows that adaptation of the dynamic portion of the HMM mean and variance parameters is critical to the success of our algorithm.
Index Terms— vector Taylor series, joint compensation, additive and convolutive distortions, robust ASR
|Published in||Proceedings IEEE Workshop on ASRU|
|Publisher||Institute of Electrical and Electronics Engineers, Inc.|
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.