Improvements on Speech Recognition for Fast Talkers

Matthew Richardson, Mei-Yuh Hwang, Alex Acero, and Xuedong Huang

Abstract

The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when using a speech recognition system in order to get higher throughput. Unfortunately, state-of-the-art SR systems perform significantly worse on fast speech. In this paper, we present our efforts in making our system more robust to fast speech. We propose cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus. Moreover, this improvement is additive to the contribution of Maximum Likelihood Linear Regression (MLLR) adaptation. Together with MLLR, a 23% error rate reduction was achieved.

Details

Publication typeInproceedings
Published inProc. of the Eurospeech Conference
> Publications > Improvements on Speech Recognition for Fast Talkers