Maximum Entropy Based Generic Filter for Language Model Adaptation

Dong Yu, Milind Mahajan, P. Mau, and Alex Acero

Abstract

Language Model (LM) Adaptation has been shown to be very important to reduce the Word Error Rate (WER) in task specific speech recognition systems. Adaptation data collected in the real world, however, usually contain large amount of non-dictated text such as email headers, long URL, code fragments, included reply, signature, etc. that the user will never dictate. Adapting with these data may corrupt the LM. In this paper, we propose a Maximum Entropy (MaxEnt) based filter to remove a variety of non-dictated words from the adaptation data and improve the effectiveness of the LM adaptation. We argue that this generic filter is language independent and efficient. We describe the design of the filter, and show that the usage of the filter can give us 10% relative WER reduction over LM adaptation without the filtering, and 22% relative WER reduction over the un-adapted LM in English email dictation task.

Details

Publication typeInproceedings
Published inProc. of the Int. Conf. on Acoustics, Speech, and Signal Processing
PublisherIEEE
> Publications > Maximum Entropy Based Generic Filter for Language Model Adaptation