variable-activation and variable-input deep neural network for robust speech recognition

IEEE SLT workshop |

Published by IEEE - Institute of Electrical and Electronics Engineers

In a previous study, we proposed a variable-component deep neural network (VCDNN) to improve the robustness of contextdependent deep neural network hidden Markov model (CD-DNNHMM). We model the components of DNN with a set of polynomial functions of environmental variables, more specifically signal-to-noise ratio (SNR) in that study. We refined VCDNN on two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. These two methods are called variableparameter DNN (VPDNN) and variable-output DNN (VODNN). Although both methods got good gain over the standard DNN, they doubled the number of parameters even with only the first-order environment variable. In this study, we propose two new types of VCDNN, namely variable activation DNN (VADNN) and variable input DNN (VIDNN). The environment variable is applied to the hidden layer activation function in VADNN, and is applied directly to the input in VIDNN. Both VCDNNs only increase a negligible number of parameters compared to the standard DNN. Experimental results on the Aurora4 task show that both methods have similar performance as VPDNN, obtaining around relative 3.71% word error reduction from the standard DNN with negligible increase in number of parameters.