Towards Non-Stationary Model-Based Noise Adaptation for Large Vocabulary Speech Recognition

Recognition rates of speech recognition systems are known

to degrade substantially when there is a mismatch between

training and deployment environments. One approach to

tackling this problem is to transform the acoustic models

based on the channel distortion and noise characteristics of

the new environment. Currently, most model adaptation

strategies assume that the noise characteristics are stationary.

We present results for using multiple noise distributions

for the Whisper large vocabulary speech recognition

system. The Vector Taylor Series method for adaptation of

the distributions is used, and either a weighted average of

the noise states or the locally best noise states is used. Our

results indicate that for certain types of noise, significant

gains in recognition accuracy can be achieved.

2001-trausti-icassp.pdf
PDF file

In  Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing

Details

TypeInproceedings
> Publications > Towards Non-Stationary Model-Based Noise Adaptation for Large Vocabulary Speech Recognition