Improvements on Speech Recognition for Fast Talkers

Matthew Richardson, Mei-Yuh Hwang, Alex Acero, and Xuedong Huang


The accuracy of a speech recognition (SR) system depends on

many factors, such as the presence of background noise,

mismatches in microphone and language models, variations in

speaker, accent and even speaking rates. In addition to fast

speakers, even normal speakers will tend to speak faster when

using a speech recognition system in order to get higher

throughput. Unfortunately, state-of-the-art SR systems perform

significantly worse on fast speech. In this paper, we present

our efforts in making our system more robust to fast speech.

We propose cepstrum length normalization, applied to the

incoming testing utterances, which results in a 13% word error

rate reduction on an independent evaluation corpus. Moreover,

this improvement is additive to the contribution of Maximum

Likelihood Linear Regression (MLLR) adaptation. Together

with MLLR, a 23% error rate reduction was achieved.


Publication typeInproceedings
Published inProc. of the Eurospeech Conference
> Publications > Improvements on Speech Recognition for Fast Talkers