Mike Seltzer and Alex Acero
We recently proposed a method for HMM adaptation to noisy environments called Linear Spline Interpolation (LSI). LSI uses linear spline regression to model the relationship between clean and noisy speech features. In the original algorithm, stereo training data was used to learn the spline parameters that min- imize the error between the predicted and actual noisy speech features. The estimated splines are then used at runtime to adapt the clean HMMs to the current environment. While good results can be obtained with this approach, the performance is limited by the fact that the splines are trained independently from the speech recognizer and as such, they may actually be subopti- mal for adaptation. In this work, we introduce a new General- ized EM algorithm for estimating the spline parameters using the speech recognizer itself. Experiments on the Aurora 2 task show that using LSI adaptation with splines trained in this man- ner results in a 20% improvement over the original LSI algo- rithm that used splines estimated from stereo data and a 28% improvement over VTS adaptation.
|Publisher||International Speech Communication Association|