Chapter 8: Deep Neural Network Sequence-Discriminative Training

  • Dong Yu ,
  • Li Deng

in Automatic Speech Recognition --- A Deep Learning Approach

Published by Springer | 2014 | Automatic Speech Recognition --- A Deep Learning Approach edition

The cross-entropy criterion discussed in the previous chapters treats each frame independently. However, speech recognition is a sequence classification problem. In this chapter, we introduce the sequence-discriminative training techniques that match better to the problem. We describe the popular maximum mutual information (MMI), boosted MMI (BMMI), minimum phone error (MPE), and minimum Bayes risk (MBR) training criteria, and discuss the practical techniques, including lattice generation, lattice compensation, frame dropping, frame smoothing, and learning rate adjustment, to make DNN sequence-discriminative training effective.