Papers are listed in the order they were submitted. You can download all the papers in one zip file.

Accepted Papers

Title:

MCMC for Hierarchical Semi-Markov Conditional Random Fields

Abstract:

Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. However, the inference can be expensive for problems with arbitrary sequence length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in both length and depth, at the cost of some controllable loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.

Author Names:

Truyen Tran*, Curtin Uni of Technology
Dinh Phung,
Svetha Venkatesh,
Hung Bui,

Files:

full paper in pdf


Title:

Learning in the Deep-Structured Conditional Random Fields

Abstract:

We have proposed the deep-structured conditional random fields (CRFs) for sequential labeling and classification recently. The core of this model is its deep structure and its discriminative nature. This paper outlines the learning strategies and algorithms we have developed for the deep-structured CRFs, with a focus on the new strategy that combines the layer-wise unsupervised pre-training using entropy-based multi-objective optimization and the conditional likelihood-based back-propagation fine tuning, as inspired by the recent development in learning deep belief networks.

Author Names:

Dong Yu*, Microsoft Research
Li Deng, Microsoft Research
Shizhen Wang, University of California, Los Angeles

Files:


Title:

A Hierarchy of Recurrent Networks for Speech Recognition

Abstract:

Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over
time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed.

Author Names:

Benjamin Schrauwen*, Ghent University
Lars Buesing, Graz University of Technology

Files:


Title:

Competitive Learning for Deep Temporal Networks

Abstract:

We propose the use of competitive learning in deep networks for understanding sequential data. Hierarchies of competitive learning algorithms have been found in the brain [1] and their use in deep vision networks has been validated [2]. The algorithm is simple to comprehend and yet provides fast, sparse learning. To understand temporal patterns we use the depth of the network and delay blocks to encode time. The delayed feedback from higher layers provides meaningful predictions to lower layers. We evaluate a multi-factor network design by using it to predict frames in movies it has never seen before. At this task our system outperforms the prediction of the Recurrent Temporal Restricted Boltzmann Machine [3] on novel frame changes.

Author Names:

Robert Gens*, University of Washington
Pedro Domingos, University of Washington

Files:


Title:

A Deep Learning Architecture Comprising Homogeneous Cortical Circuits for Scalable Spatiotemporal Pattern Inference

Abstract:

A key challenge associated with the design of scalable deep learning architectures pertains to efficiently capturing spatiotemporal dependencies in a scalable framework that is modality independent. This paper presents a novel discriminative deep learning architecture, which relies on an identical cortical circuit populating the hierarchical structure. Belief states formed across the hierarchy intrinsically capture sequences of patterns, rather than static patterns, thereby facilitating the embedding of temporal dependencies. At the core of the adaptation mechanism are two learned constructs, one of which relies on a fast and stable incremental clustering. Moreover, the proposed methodology does not require layer-by-layer training and lends itself naturally to massively-parallel processing platforms. A simple test case demonstrates the validity of the architecture and learning algorithm. The system can be efficiently applied to various modalities, including those associated with complex visual and audio information representation.

Author Names:

Itamar Arel*, University of Tennessee
Derek Rose, University of Tennessee
Tom Karnowski, University of Tennessee

Files:


Title:

Deep Belief Networks for phone recognition

Abstract:

Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation process. Deep Belief Networks (DBNs) have recently proved to be very effective for a variety of machine learning problems and this paper applies DBNs to acoustic modeling. On the standard TIMIT corpus, DBNs consistently
outperform other techniques and the best DBN achieves a phone error rate (PER) of 23.0% on TIMIT core test set.

Author Names:

Abdel-rahman Mohamed*, University of Toronto
George Dahl, University of Toronto
Geoffrey Hinton, University of Toronto

Files:


Title:

Deep Learning For Semantic Parsing

Abstract:

Recently, Poon and Domingos (2009) developed the first approach for unsupervised semantic parsing, the USP system. They applied it to extracting a knowledge base from biomedical abstracts for question answering and showed that it substantially outperforms state-of-the-art systems such as TextRunner and DIRT. In this paper, we show that USP can be viewed as learning a deep network for semantic parsing. The hidden units in the network represent clusters of meaning expressions, whereas the visible units represent dependency trees of input sentences. USP starts with a network where each atomic expression has its own cluster, and learns the final architecture by incrementally combining hidden units to abstract away syntactic and lexical variations of the same meaning. USP can be naturally generalized to a new approach for deep learning based on structure search; we discuss the implications of this.

Author Names:

Hoifung Poon*, Univ. of Washington (CSE)
Pedro Domingos, University of Washington

Files:


Title:

Neural conditional random fields

Abstract:

We propose a non-linear graphical model for structured prediction. It combines the power of deep networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable model that we apply to signal labeling tasks.

Author Names:

Trinh-Minh-Tri Do, LIP6-UPMC
Thierry Artieres*, LIP6 - UPMC

Files:


Title:

A Multi-Objective Programming-Based Approach to Language Model Adaptation

Abstract:

The overall objective function of a MAP-based language model (LM) adaptation technique is implicitly a composition of two objective
functions: The first objective is concerned with the maximum likelihood estimation of the model parameters from the in-domain
data while the second objective is concerned with an appropriate representation of prior information obtained from a general purpose
corpus. In this paper, we separate these individual objective functions, which are at least partially conflicting, and take a multi-objective programming (MOP) approach to LM adaptation. The resulting MOP problem is solved in an iterative manner such that each objective is optimized one after another with constraints on the others. When solved this way, the target LM is in the form of a log-linear interpolation of component LMs. In our preliminary experiments with bigram LMs, the proposed approach slightly outperformed linear interpolation. In our ongoing work with trigram LMs, we expect the proposed approach to outperform linear interpolation in terms of both the perplexity and the automatic speech recognition work error rate.

Author Names:

Sibel Yaman*, ICSI
Marco Siniscalchi,
Chin-Hui Lee,

Files: full paper in pdf

Title:

Deep learning for spoken language identification

Abstract:

Empirical results have shown that many spoken language identification systems based on hand-coded features perform poorly on small speech samples where a human would be successful. A hypothesis for this low performance is that the set of extracted features is insufficient. A deep architecture that learns features automatically is implemented and evaluated on several datasets.

Author Names:

SGrégoire Montavon, Berlin Institute of Technology

Files: full paper in pdf

Title:

Unsupervised feature learning for audio classification using convolutional deep belief networks

Abstract:

In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks.

Author Names:

Honglak Lee Yan Largman Peter Pham Andrew Y. Ng, Stanford University

Files: full paper in pdf

Contact Us Terms of Use Trademarks Privacy Statement ©2010 Microsoft Corporation. All rights reserved.Microsoft