Notice: Deadline Extention

The submission deadline is now extended to Sept. 22, 2010 to accommodate those papers that are related and almost been finishing.


We define deep learning techniques as machine learning techniques that involve at least three processing steps from the input to the output. The model parameters in the deep learning systems should be learned instead of manually set. We encourage submissions related to the new models such as deep belief network as well as recent advances of earlier models such as hierarchical HMMs, hierarchical point-process models, hidden dynamic models, tandem-architecture, multi-level detection-based architectures, and hierarchical/deep-structured conditional random fields. However, the submissions must have specific connection to audio, speech, and/or language processing.

Call for papers in PDF and Text formats.

Call for Papers:

Over the past 25 years or so, speech recognition technology has been dominated largely by hidden Markov models (HMMs). Significant technological success has been achieved using complex and carefully engineered variants of HMMs. Next generation technologies require solutions to technical challenges presented by diversified deployment environments. These challenges arise from the many types of variability present in the speech signal itself. Overcoming these challenges is likely to require "deep" architectures with efficient and effective learning algorithms.

There are three main characteristics in the deep learning paradigm: 1) layered architecture; 2) generative modeling at the lower layer(s); and 3) unsupervised learning at the lower layer(s) in general. For speech and language processing and related sequential pattern recognition applications, some attempts have been made in the past to develop layered computational architectures that are "deeper" than conventional HMMs, such as hierarchical HMMs, hierarchical point-process models, hidden dynamic models, layered multilayer perceptron, tandem-architecture neural-net feature extraction, multi-level detection-based architectures, deep belief networks, hierarchical conditional random field, and deep-structured conditional random field. While positive recognition results have been reported, there has been a conspicuous lack of systematic learning techniques and theoretical guidance to facilitate the development of these deep architectures. Recent communication between machine learning researchers and speech and language processing researchers revealed a wealth of research results pertaining to insightful applications of deep learning to some classical speech recognition and language processing problems. These results can potentially further advance the state of the arts in speech and language processing.

In light of the sufficient research activities in this exciting space already taken place and their importance, we invite papers describing various aspects of deep learning and related techniques/architectures as well as their successful applications to speech and language processing. Submissions must not have been previously published, with the exception that substantial extensions of conference or workshop papers will be considered.

The submissions must have specific connection to audio, speech, and/or language processing. The topics of particular interest will include, but are not limited to:

The authors are required to follow the Author's Guide for manuscript submission to the IEEE Transactions on Audio, Speech, and Language Processing at

Contact Us Terms of Use Trademarks Privacy Statement ©2010 Microsoft Corporation. All rights reserved.Description: Description: Microsoft