Jinyu Li, Michael L. Seltzer, and Yifan Gong
By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks.
In Proc. ICASSP