An Architecture for Rapid Decoding of Large Vocabulary Conversational Speech

  • George Saon ,
  • Geoffrey Zweig ,
  • Brian Kingsbury ,
  • Lidia Mangu ,
  • Upendra Chaudhari

Proceedings of Eurospeech |

This paper addresses the question of how to design a large vocabulary recognition system so that it can simultaneously handle a sophisticated language model, perform state-ofthe-art speaker adaptation, and run in one times real time (1xRT). The architecture we propose is based on classical HMM Viterbi decoding, but uses an extremely fast initial speaker-independent decoding to estimate VTL warp factors, feature-space and model-space MLLR transformations that are used in a final speaker-adapted decoding. We present results on past Switchboard evaluation data that indicate that this strategy compares favorably to published unlimited-time systems (running in several hundred times real-time). Coincidentally, this is the system that IBM fielded in the 2003 EARS Rich Transcription evaluation.