State-Level Data Borrowing for Low-Resource Speech Recognition based on Subspace GMMs

Yanmin Qian; Daniel Povey; Jia Liu

State-Level Data Borrowing for Low-Resource Speech Recognition based on Subspace GMMs

Yanmin Qian ,
Daniel Povey ,
Jia Liu

International Speech Communication Association | August 2011

Download BibTex

Large vocabulary continuous speech recognition is always a difficult task, and it is particularly so for low-resource languages. The scenario we focus on here is having only 1 hour of acoustic training data in the “target” language. This paper presents work on a data borrowing strategy combined with the recently proposed Subspace Gaussian Mixture Model (SGMM). We developed data borrowing strategies based on two approaches: one based on minimizing K-L Divergence, and one that also takes into account state occupation counts. We demonstrate improvements versus the baseline SGMM setup, which itself is better than a conventional HMM-GMM system. The SGMMs are more robustly estimated by borrowing data from the non-target language at the acoustic state level. Although we tested the approach for SGMMs, we expect the general idea of borrowing data from a non-target language to be applicable for conventional GMMs as well.