How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information

Jasha Droppo, Milind Mahajan, Asela Gunawardana, and Alex Acero

Abstract

This paper presents a general discriminative training method for the front end of an automatic speech recognition system. The SPLICE parameters of the front end are trained using stochastic gradient descent (SGD) of a maximum mutual information (MMI) objective function. SPLICE is chosen for its ability to approximate both linear and non-linear transformations of the feature space. SGD is chosen for its simplicity of implementation. Results are presented on both the Aurora 2 small vocabulary task and the WSJ Nov-92 medium vocabulary task. It is shown that the discriminative front end is able to consistently increase system accuracy across different front end configurations and tasks.

Details

Publication typeInproceedings
Published inProc. of the IEEE Workshop on Automatic Speech Recognition and Understanding
AddressPuerto Rico
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information