Jian Xu, Yu Zhang, Zhi-Jie Yan, and Qiang Huo
27 August 2011
This paper presents a new approach to acoustic sniffing for irrelevant variability normalization (IVN) based acoustic model training and speech recognition. Given a training corpus, a so-called i-vector is extracted from each training speech segment. A clustering algorithm is used to cluster the training i-vectors into multiple clusters, each corresponding to an acoustic condition. The acoustic sniffing can then be implemented as finding the most similar cluster by comparing the i-vector extracted from a speech segment with the centroid of each cluster. Experimental results on Switchboard-1 conversational telephone speech transcription task suggest that the i-vector based acoustic sniffing outperforms our previous Gaussian mixture model (GMM) based approach. The proposed approach is very efficient therefore can deal with very large scale training corpus on current mainstream computing platforms, yet has very low run-time cost.
|Published in||12th Annual Conference of the International Speech Communication Association, InterSpeech 2011|
|Publisher||International Speech Communication Association|