High-Performance Robust Speech Recognition Using Stereo Training Data

L. Jiang; Li Deng; Jasha Droppo; Alex Acero; Xuedong Huang

High-Performance Robust Speech Recognition Using Stereo Training Data

L. Jiang ,
Li Deng ,
Jasha Droppo ,
Alex Acero ,
Xuedong Huang

Proc. ICASSP | May 2001

Published by Institute of Electrical and Electronics Engineers, Inc.

Download BibTex

We describe a novel technique of SPLICE for high performance robust speech recognition. It is an efficient noise reduction and channel distortion compensation technique that makes effective use of stereo training data. In this paper, we present a version of SPLICE using the minimum mean square error decision, and describe an extension by training clusters of HMMs with SPLICE processing. Comprehensive results using a Wall Street Journal large vocabulary recognition task and with a wide range of noise types demonstrate superior performance of the SPLICE technique over that under noisy matched conditions (13% word error rate reduction). The new technique is also shown to consistently outperform the spectralsubtraction and the fixed CDCN noise reduction techniques. It is currently being integrated into the Microsoft MiPad, a new generation PDA prototype.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.