Using collective information in semi-supervised learning for speech recognition

Balakrishnan Varadarajan; Dong Yu; Li Deng; Alex Acero

Using collective information in semi-supervised learning for speech recognition

Balakrishnan Varadarajan ,
Dong Yu ,
Li Deng ,
Alex Acero

Proceedings of the ICASSP | April 2009

Published by Institute of Electrical and Electronics Engineers, Inc.

Download BibTex

Training accurate acoustic models typically requires a large amount of transcribed data, which can be expensive to obtain. In this paper, we describe a novel semi-supervised learning algorithm for automatic speech recognition. The algorithm determines whether a hypothesized transcription should be used in the training by taking into consideration collective information from all utterances available instead of solely based on the confidence from that utterance itself. It estimates the expected entropy reduction each utterance and transcription pair may cause to the whole unlabeled dataset and choose the ones with the positive gains. We compare our algorithm with existing confidence-based semi-supervised learning algorithm and show that the former can consistently outperform the latter when the same amount of utterances is selected into the training set. We also indicate that our algorithm may determine the cutoff-point in a principled way by demonstrating that the point it finds is very close to the achievable peak point.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.