Over the past 25 years or so, speech recognition technology has been
dominated by a “shallow” architecture --- hidden Markov models (HMMs).
Significant technological success has been achieved using complex and carefully
engineered variants of HMMs. The next generation of the technology requires
solutions to remaining technical challenges under diversified deployment
environments. These challenges, not adequately addressed in the past, arise from
the many types of variability present in the speech generation process.
Overcoming these challenges is likely to require “deep” architectures with
efficient learning algorithms.
For speech recognition and related sequential pattern recognition applications, some attempts have been made in the past to develop computational architectures that are “deeper” than conventional HMMs, such as hierarchical HMMs, hierarchical point-process models, hidden dynamic models, and multi-level detection-based architectures, etc. While positive recognition results have been reported, there has been a conspicuous lack of systematic learning techniques and theoretical guidance to facilitate the development of these deep architectures. Further, there has been virtually no effective communication between machine learning researchers and speech recognition researchers who are both advocating the use of deep architecture and learning. One goal of the proposed workshop is to bring together these two groups of researchers to review the progress in both fields and to identify promising and synergistic research directions for potential future cross-fertilization and collaboration.