Single-channel Mixed Speech Recognition Using Deep Neural Networks

Chao Weng, Dong Yu, Mike Seltzer, and Jasha Droppo

Abstract

In this work, we study the problem of single-channel mixed speech recognition using deep neural networks (DNNs). Using a multi-style training strategy on arti?cially mixed speech data, we investigate several different training setups that enable the DNN to generalize to corresponding similar patterns in the test data. We also introduce a WFST-based two-talker decoder to work with the trained DNNs. Experiments on the 2006 speech separation and recogni- tion challenge task demonstrate that the proposed DNN-based sys- tem has remarkable noise robustness to the interference of a com- peting speaker. The best setup of our proposed systems achieves an overall WER of 19.7% which improves upon the results obtained by the state-of-the-art IBM superhuman system by 1.9% absolute, with fewer assumptions and lower computational complexity.

Details

Publication typeInproceedings
Published inICASSP
PublisherIEEE SPS
> Publications > Single-channel Mixed Speech Recognition Using Deep Neural Networks