Chao Weng, Dong Yu, Shinji Watanabe, and Fred Jung
In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connec- tions are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more ef?cient and effective. We evaluate the proposed recurrent DNN architecture un- der the hybrid setup on both the 2 nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7% rela- tive WER improvements over the DNN systems, achieving state-of- the-art performance without front-end preprocessing, speaker adap- tive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4% relative WER improve- ment over a strong DNN baseline system.