NIPS 2008 WORKSHOP
Speech and Language: Learning-based Methods and Systems
Friday, December 12, 2008
Whistler, British Columbia, Canada
ACCEPTED PAPERS
Title: The
Maximum Entropy Model with Continuous Features
Authors: Dong Yu, Li Deng, and Alex
Acero
Abstract:
We present the maximum
entropy (MaxEnt) model with continuous features. We
show that for the continuous features the weights should be continuous
functions instead of single values. We propose a spline
interpolation based solution to the optimization problem that contains
continuous weights and illustrate that the optimization problem can be
converted into a standard log-linear one without continuous weights at a
higher-dimensional space.
Authors: Sangyun
Hahn and Mari Ostendorf
Abstract:
Recently, semi-supervised
learning has been an active research topic in the natural language processing
community, to save effort in hand-labeling for data-driven learning and to
exploit a large amount of readily available unlabeled text. In this paper, we
apply EM-based semi-supervised learning algorithms such as traditional EM,
co-EM, and cross validation EM to the task of agreement/disagreement
classification of multi-party conversational speech, using discriminative models
such as support vector machines and multi-layer perceptrons.
We experimentally compare and discuss their advantages and weaknesses when used
with different amounts of unlabeled data.
Title: Learning
Methods in Multilingual Speech Recognition
Authors: Hui
Lin, Li Deng, Jasha Droppo,
Dong Yu, and Alex Acero
Abstract:
One key issue in developing
learning methods for multilingual acoustic modeling in large vocabulary
automatic speech recognition (ASR) applications is to maximize the benefit of
boosting the acoustic training data from multiple source languages while
minimizing the negative effects of data impurity arising from language
“mismatch”. In this paper, we introduce two learning methods, semiautomatic
unit selection and global phonetic decision tree, to address this issue via
effective utilization of acoustic data from multiple languages. The
semi-automatic unit selection is aimed to combine the merits of both
data-driven and knowledgedriven approaches to
identifying the basic units in multilingual acoustic modeling. The global
decision-tree method allows clustering of cross-center phones and cross-center
states in the HMMs, offering the potential to discover a better sharing
structure beneath the mixed acoustic dynamics and context mismatch caused by
the use of multiple languages’ acoustic data. Our preliminary experiment
results show that both of these learning methods improve the performance of
multilingual speech recognition.
Title: Unsupervised
Audio Speech Segmentation Using the Voting Experts Algorithm
Authors: Matthew Miller and
Alexander Stoytchev
Abstract:
Human beings have an
apparently innate ability to segment continuous audio speech into words, and
that ability is present in infants as young as 8 months old. This propensity
towards audio segmentation seems to lay the groundwork for language learning in
human beings. To artificially reproduce this ability would be both practically
useful and theoretically enlightening. In this paper we propose an algorithm
for the unsupervised segmentation of audio speech, based on the Voting Experts
(VE) algorithm, which was originally designed to segment sequences of discrete
tokens into categorical episodes. We demonstrate that our procedure is capable
of inducing breaks with an accuracy substantially
greater than chance, and suggest possible avenues of exploration to further
increase the segmentation quality. We also show that this algorithm can
reproduce results obtained from segmentation experiments performed with
8-month-old infants.
Title: System
Combination for Machine Translation Using N-Gram Posterior Probabilities
Authors: Yong Zhao and Xiaodong
He
Abstract:
This paper proposes using
n-gram posterior probabilities, which are estimated over translation hypotheses
from multiple machine translation (MT) systems, to improve the performance of
the system combination. Two ways using n-gram posteriors in confusion network
decoding are presented. The first way is based on n-gram posterior language
model per source sentence, and the second, called n-gram segment voting, is to
boost word posterior probabilities with n-gram occurrence frequencies. The two
n-gram posterior methods are incorporated in the confusion network as
individual features of a log-linear combination model. Experiments on the
Chinese-to-English MT task show that both methods yield significant
improvements on the translation performance, and an
combination of these two features produces the best translation performance.