Hamid Palangi, Li Deng, and Rabab K Ward
Deep Stacking Networks (DSNs) are constructed by stacking shallow feed-forward neural networks on top of each other using concatenated features derived from the lower modules of the DSN and the raw input data. DSNs do not have recurrent connections, making them less effective to model and classify input data with temporal dependencies. In this paper, we embed recurrent connections into the DSN, giving rise to Recurrent Deep Stacking Networks (R-DSNs). Each module of the R-DSN consists of a special form of recurrent neural networks. Generalizing from the earlier DSN, the use of linearity in the output units of the R-DSN enables us to derive a closed form for computing the gradient of the cost function with respect to all network matrices without backpropagating errors. Each module in the R-DSN is initialized with an echo state network, where the input and recurrent weights are fixe d to have the echo state property. Then all connection weights within the module are fine tuned using batch-mode gradient descent where the gradient takes an analytical form. Experiments are performed on the TIMIT dataset for frame-level phone state classification with 183 classes. The results sho w that the R-DSN gives higher classification accuracy over a single recurrent neural network without stacking.
|Publisher||IEEE Conference ChinaSIP|