Abstract: I will start by explaining how deep belief nets can be learned one layer at a time without using any label information. I will then present evidence that this type of "pre-training" creates excellent features for the hidden layers of deep, feedforward neural networks that are then fine-tuned with backpropagation. The pre-training greatly reduces overfitting especially when additional, unlabeled data is available. It also makes the optimization much easier. I will then describe several different types of units that can be used in deep belief nets and several different learning algorithms that can be used for the pre-training and fine-tuning. Finally I will briefly describe a variety of applications, including phone recognition, in which deep belief nets have outperformed the other methods..
Abstract: The current dominant technology in speech recognition is based on the hidden Markov model (HMM), a shallow, two-layer architecture that has been carefully engineered over nearly 30 years, with the performance nevertheless far lower than human speech recognition. Researchers have recognized fundamental limitations of such an architecture, and have made a multitude of attempts to develop "deeper" computational architectures for acoustic models in speech recognition aimed to overcome the limitations. These research efforts have been largely isolated in the past, and in this overview talk, we intend to provide a fresh look that this rich body of work and analyze them within a common machine learning framework. The topics to be covered include: 1) multi-level, detection-based framework; 2) Structured speech models (super-segmental or hidden dynamic models); 3) tandem neural network architecture; 4) layered neural network architecture; 5) hierarchical conditional random field; and 6) deep-structured conditional random field. Based on the analysis of the above "beyond-HMM" architectures, we discuss future directions in speech recognition.