LASSO environment model combination for robust speech recognition

Xiong Xiao, Jinyu Li, and et. al


In this paper, we propose a novel acoustic model adaptation method

for noise robust speech recognition. Model combination is a com-

mon way to adapt acoustic models to a target test environment. For

example, the mean supervectors of the adapted model is obtained

as a linear combination of mean supervectors of many pre-trained

environment-dependent acoustic models. Usually, the combination

weights are estimated using a maximum likelihood (ML) criterion

and the weights are nonzero for all the mean supervectors. We pro-

pose to estimate the weights by using Lasso (least absolute shrink-

age and selection operator) which imposes an 𝐿1 regularization term

in the weight estimation problem to shrink some weights to exactly

zero. Our study shows that Lasso usually shrinks to zero the weights

of those mean supervectors not relevant to the test environment. By

removing some nonrelevant supervectors, the obtained mean super-

vectors are found to be more robust against noise distortions. Ex-

perimental results on Aurora-2 task show that the Lasso-based mean

combination consistently outperforms ML-based combination.


Publication typeInproceedings
Published inProc. ICASSP
> Publications > LASSO environment model combination for robust speech recognition