Li Deng and Dong Yu
This book is aimed to provide an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria: 1) expertise or knowledge of the authors; 2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and 3) the application areas that have the potential to be impacted significantly by deep learning and that have gained concentrated research efforts, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
In Chapter 1, we provide the background of deep learning, as intrinsically connected to the use of multiple layers of nonlinear transformations to derive features from the sensory signals such as speech and visual images. In the most recent literature, deep learning is embodied also as representation learning, which involves a hierarchy of features or concepts where higher-level representations of them are defined from lower-level ones and where the same lower-level representations help to define higher-level ones. In Chapter 2, a brief historical account of deep learning is presented. In particular, selected chronological development of speech recognition is used to illustrate the recent impact of deep learning that has become a dominant technology in speech recognition industry within only a few years since the start of a collaboration between academic and industrial researchers in applying deep learning to speech recognition. In Chapter 3, a three-way classification scheme for a large body of work in deep learning is developed. We classify a growing number of deep learning techniques into unsupervised, supervised, and hybrid categories, and present qualitative descriptions and a literature survey for each category. From Chapter 4 to Chapter 6, we discuss in detail three popular deep networks and related learning methods, one in each category. Chapter 4 is devoted to deep autoencoders as a prominent example of the unsupervised deep learning techniques. Chapter 5 gives a major example in the hybrid deep network category, which is the discriminative feed-forward neural network for supervised learning with many layers initialized using layer-by-layer generative, unsupervised pre-training. In Chapter 6, deep stacking networks and several of the variants are discussed in detail, which exemplify the discriminative or supervised deep learning techniques in the three-way categorization scheme.
In Chapters 7-11, we select a set of typical and successful applications of deep learning in diverse areas of signal and information processing and of applied artificial intelligence. In Chapter 7, we review the applications of deep learning to speech and audio processing, with emphasis on speech recognition organized according to several prominent themes. In Chapters 8, we present recent results of applying deep learning to language modeling and natural language processing. Chapter 9 is devoted to selected applications of deep learning to information retrieval including Web search. In Chapter 10, we cover selected applications of deep learning to image object recognition in computer vision. Selected applications of deep learning to multi-modal processing and multi-task learning are reviewed in Chapter 11. Finally, an epilogue is given in Chapter 12 to summarize what we presented in earlier chapters and to discuss future challenges and directions.