Li Deng, Dong Yu, and John Platt
Deep Neural Networks (DNNs) have shown remarkable success in pattern recognition tasks. However, parallelizing DNN training across computers has been dif?cult. We present the Deep Stack- ing Network (DSN), which overcomes the problem of paralleliz- ing learning algorithms for deep architectures. The DSN provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module. Ad- ditional ?ne tuning further improves the DSN, while introducing mi- nor non-convexity. Full learning in the DSN is batch-mode, making it amenable to parallel training over many machines and thus be scal- able over the potentially huge size of the training data. Experimental results on both the MNIST (image) and TIMIT (speech) classi?ca- tion tasks demonstrate that the DSN learning algorithm developed in this work is not only parallelizable in implementation but it also attains higher classi?cation accuracy than the DNN.
In ICASSP 2012
Publisher IEEE SPS