Geoffrey Zweig and Patrick Nguyen
2009
This paper introduces a class of discriminative features for use
in maximum entropy speech recognition models. The features
we propose are acoustic detectors for discriminatively determined
multi-phone units. The multi-phone units are found by
computing the mutual information between the phonetic subsequences
that occur in the training lexicon, and the word labels.
This quantity is a function of an error model governing our ability
to detect phone sequences accurately (an otherwise informative
sequence which cannot be reliably detected is not so useful).
We show how to compute this mutual information quantity
under a class of error models efficiently, in one pass over the
data, for all phonetic sub-sequences in the training data. After
this computation, detectors are created for a subset of highly informative
units. We then define two novel classes of features
based on these units: associative and transductive. Incorporating
these features in a maximum entropy based direct model for
Voice-Search outperforms the baseline by 24%in sentence error
rate.
![]() PDF file |
In: Interspeech
| Type: | Inproceedings |