CHAPTER 1.2 Deep Discriminative and Generative Models for Speech Pattern Recognition

  • Li Deng ,
  • Navdeep Jaitly

in Handbook of Pattern Recognition and Computer Vision (Ed. C.H. Chen)

Published by World Scientific | 2016 | Handbook of Pattern Recognition and Computer Vision (Ed. C.H. Chen) edition

In this chapter we describe deep generative and discriminative models as they have been applied to speech recognition and related pattern recognition problems. The former models describe the distribution of data or the joint distribution of data and the corresponding targets, whereas the latter models describe the distribution of targets conditioned on data. Both models are characterized as being ‘deep’ as they use layers of latent or hidden variables. Understanding and exploiting tradeoffs between deep generative and discriminative models is a fascinating area of research and it forms the background of this chapter. We focus on speech recognition but our analysis is applicable to other domains. We suggest ways in which deep generative models can be beneficially integrated with deep discriminative models based on their respective strengths. We also examine the recent advances in end-to-end optimization, a hallmark of deep learning that differentiates it from most standard pattern recognition practices.