Justine Kao, Geoffrey Zweig, and Patrick Nguyen
2011
This paper describes a new approach to modeling duration
for LVCSR using SCARF, a toolkit for speech recognition
with segmental conditional random fields. We utilize
SCARF’s ability to integrate long-span, segment-level
features to design and test duration models that help
discriminate between correct and incorrect word hypotheses.
We show that the duration distributions of correct and
incorrect word hypotheses differ. Given a word hypothesis
in the lattice and its duration, conditional length
probabilities are integrated to the SCARF system as duration
features. We evaluate three kinds of duration features on
Broadcast News: word, pre- and post-pausal durations, and
word span confusions. Adding the duration features to
SCARF results in an up to 0.3% improvement over a stateof-
the-art discriminatively trained baseline of 15.3% WER
on a Broadcast News task.
![]() PDF file |
In ICASSP
Publisher IEEE
| Type | Inproceedings |