Maximum Mutual Information Multi-phone Units in Direct Modeling

Geoffrey Zweig and Patrick Nguyen

Abstract

This paper introduces a class of discriminative features for use

in maximum entropy speech recognition models. The features

we propose are acoustic detectors for discriminatively determined

multi-phone units. The multi-phone units are found by

computing the mutual information between the phonetic subsequences

that occur in the training lexicon, and the word labels.

This quantity is a function of an error model governing our ability

to detect phone sequences accurately (an otherwise informative

sequence which cannot be reliably detected is not so useful).

We show how to compute this mutual information quantity

under a class of error models efficiently, in one pass over the

data, for all phonetic sub-sequences in the training data. After

this computation, detectors are created for a subset of highly informative

units. We then define two novel classes of features

based on these units: associative and transductive. Incorporating

these features in a maximum entropy based direct model for

Voice-Search outperforms the baseline by 24% in sentence error

rate.

Details

Publication typeInproceedings
Published inInterspeech 2009
PublisherInternational Speech Communication Association
> Publications > Maximum Mutual Information Multi-phone Units in Direct Modeling