Joint Discriminative Front End and Back End Training for Improved Speech Recognition Accuracy

Jasha Droppo; Alex Acero

Joint Discriminative Front End and Back End Training for Improved Speech Recognition Accuracy

Jasha Droppo ,
Alex Acero

Proc. ICASSP | May 2006

Published by Institute of Electrical and Electronics Engineers, Inc.

Download BibTex

This paper presents a general discriminative training method for both the front end feature extractor and back end acoustic model of an automatic speech recognition system. The front end and back end parameters are jointly trained using the Rprop algorithm against a maximum mutual information (MMI) objective function. Results are presented on the Aurora 2 noisy English digit recognition task. It is shown that discriminative training of the front end or back end alone can improve accuracy, but joint training is considerably better.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.