Over the past 25 years or so, speech recognition technology has been
dominated by a “shallow” architecture --- hidden Markov models (HMMs).
Significant technological success has been achieved using complex and carefully
engineered variants of HMMs. The next generation of the technology requires
solutions to remaining technical challenges under diversified deployment
environments. These challenges, not adequately addressed in the past, arise from
the many types of variability present in the speech generation process.
Overcoming these challenges is likely to require “deep” architectures with
efficient learning algorithms.
For speech recognition and related sequential pattern recognition applications,
some attempts have been made in the past to develop computational architectures
that are “deeper” than conventional HMMs, such as hierarchical HMMs,
hierarchical point-process models, hidden dynamic models, and multi-level
detection-based architectures, etc. While positive recognition results have been
reported, there has been a conspicuous lack of systematic learning techniques
and theoretical guidance to facilitate the development of these deep
architectures. Further, there has been virtually no effective communication
between machine learning researchers and speech recognition researchers who are
both advocating the use of deep architecture and learning. One goal of the
proposed workshop is to bring together these two groups of researchers to review
the progress in both fields and to identify promising and synergistic research
directions for potential future cross-fertilization and collaboration.