Yu Zhang, Jian Xu, Zhi-Jie Yan, and Qiang Huo
22 March 2011
This paper presents a discriminative training (DT) approach to irrelevant variability normalization (IVN) based training of feature transforms and hidden Markov models for large vocabulary continuous speech recognition. A speaker-clustering based method is used for acoustic sniffing and maximum mutual information (MMI) is used as a training criterion. Combined with unsupervised adaptation of feature transforms, the IVN-based DT approach achieves a 14.5% relative word error rate reduction over an MMI-trained baseline system on a Switchboard-1 conversational telephone speech transcription task.
|Published in||IEEE International Conference on Acoustics, Speech and Signal Processing, 2011, ICASSP 2011|
|Publisher||IEEE International Confrence on Acoustics, Speech, and Signal Processing (ICASSP)|