Chapter 13: Recurrent Neural Networks and Related Models

  • Li Deng ,
  • Dong Yu

in Automatic Speech Recognition --- A Deep Learning Approach

Published by Springer | 2014 | Automatic Speech Recognition --- A Deep Learning Approach edition

A recurrent neural network (RNN) is a class of neural network models where many connections among its neurons form a directed cycle. This gives rise to the structure of internal states or memory in the RNN, endowing it with the dynamic temporal behavior not exhibited by the DNN discussed in earlier chapters. In this chapter, we rst present the state-space formulation of the basic RNN as a nonlinear dynamical system, where the recurrent matrix governing the system dynamics is largely unstructured. For such basic RNNs, we describe two algorithms for learning their parameters in some detail: 1) the most popular algorithm of backpropagation through time (BPTT); and 2) a more rigorous, primal-dual optimization technique, where constraints on the RNN’s recurrent matrix are imposed to guarantee stability during RNN learning. Going beyond basic RNNs, we further study an advanced version of the RNN, which exploits the structure called long short term memory (LSTM), and analyze its strengths over the basic RNN both in terms of model construction and of practical applications including some latest speech recognition results. Finally, we analyze the RNN as a bottom-up, discriminative, dynamic system model against the top-down, generative counterpart of dynamic system as discussed in Chapter ??. The analysis and discussion lead to potentially more eective and advanced RNNlike architectures and learning paradigm where the strengths of discriminative and generative modeling are integrated while their respective weaknesses are overcome.