Discriminative Duration Modeling for Speech Recognition with Segmental Conditional Random Fields

Justine Kao, Geoffrey Zweig, and Patrick Nguyen

Abstract

This paper describes a new approach to modeling duration

for LVCSR using SCARF, a toolkit for speech recognition

with segmental conditional random fields. We utilize

SCARF’s ability to integrate long-span, segment-level

features to design and test duration models that help

discriminate between correct and incorrect word hypotheses.

We show that the duration distributions of correct and

incorrect word hypotheses differ. Given a word hypothesis

in the lattice and its duration, conditional length

probabilities are integrated to the SCARF system as duration

features. We evaluate three kinds of duration features on

Broadcast News: word, pre- and post-pausal durations, and

word span confusions. Adding the duration features to

SCARF results in an up to 0.3% improvement over a stateof-

the-art discriminatively trained baseline of 15.3% WER

on a Broadcast News task.

Details

Publication typeInproceedings
Published inICASSP
PublisherIEEE
> Publications > Discriminative Duration Modeling for Speech Recognition with Segmental Conditional Random Fields