Mike Seltzer, Alex Acero, and Kaustubh Kalgaonkar
We recently proposed a new algorithm to perform acoustic model adaptation to noisy environments called Linear Spline Interpolation (LSI). In this method, the nonlinear relationship between clean and noisy speech features is modeled using linear spline regression. Linear spline parameters that minimize the error the between the predicted noisy features and the actual noisy features are learned from training data. A variance associated with each spline segment captures the uncertainty in the assumed model. In this work, we extend the LSI algorithm in two ways. First, the adaptation scheme is extended to compensate for the presence of linear channel distortion. Second, we show how the noise and channel parameters can be updated during decoding in an unsupervised manner within the LSI framework. Using LSI, we obtain an average relative improvement in word error rate of 10.8% over VTS adaptation on the Aurora 2 task with improvements of 15-18% at SNRs between 10 and 15 dB.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. http://www.ieee.org/