T. Kristjansson, B. Frey, Li Deng, and Alex Acero
Recognition rates of speech recognition systems are known
to degrade substantially when there is a mismatch between
training and deployment environments. One approach to
tackling this problem is to transform the acoustic models
based on the channel distortion and noise characteristics of
the new environment. Currently, most model adaptation
strategies assume that the noise characteristics are stationary.
We present results for using multiple noise distributions
for the Whisper large vocabulary speech recognition
system. The Vector Taylor Series method for adaptation of
the distributions is used, and either a weighted average of
the noise states or the locally best noise states is used. Our
results indicate that for certain types of noise, significant
gains in recognition accuracy can be achieved.
|Published in||Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing|