Discriminative Duration Modeling for Speech Recognition with Segmental Conditional Random Fields

This paper describes a new approach to modeling duration

for LVCSR using SCARF, a toolkit for speech recognition

with segmental conditional random fields. We utilize

SCARF’s ability to integrate long-span, segment-level

features to design and test duration models that help

discriminate between correct and incorrect word hypotheses.

We show that the duration distributions of correct and

incorrect word hypotheses differ. Given a word hypothesis

in the lattice and its duration, conditional length

probabilities are integrated to the SCARF system as duration

features. We evaluate three kinds of duration features on

Broadcast News: word, pre- and post-pausal durations, and

word span confusions. Adding the duration features to

SCARF results in an up to 0.3% improvement over a stateof-

the-art discriminatively trained baseline of 15.3% WER

on a Broadcast News task.

scarf_duration.pdf
PDF file

In  ICASSP

Publisher  IEEE

Details

TypeInproceedings
> Publications > Discriminative Duration Modeling for Speech Recognition with Segmental Conditional Random Fields